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Abstract 

A coloring of a graph is an assignment of colors to the vertices so that no two adjacent 
vertices are given the same color. The problem of coloring a graph with the minimum 
number of colors is well known to be NP-hard, even restricted to A;-colorable graphs for 
constant k > 3. This thesis explores the approximation problem of coloring fc-colorable 
graphs with as few additional colors as possible in polynomial time, focusing on the case of 
k = 3. 

For the worst-case problem, the previous best upper bound on the number of colors 
needed for coloring 3-colorable n-vertex graphs in polynomial time is 0{y/n/^/\ogn) colors 
by Berger and Rompel, improving a bound of 0(y/n) colors by Wigderson. We present 
an algorithm to color any 3-colorable graph with 0(n 3/8 polylog(n)) colors, breaking an 
u 0(n 1/2 ~ ^) barrier". The algorithm presented here is based on examining second-order 
neighborhoods of vertices, rather than just immediate neighborhoods of vertices as in pre- 
vious approaches. We extend our results to improve the worst-case bounds for coloring 
fc-colorable graphs for constant k > 3 as well. 

We also examine the problem of coloring random A;-colorable graphs. We consider a 
standard model in which vertices are first randomly assigned to one of k color classes 
and then each edge between two vertices of different color is placed into the graph with 
probability p. For sufficiently high edge probability, it is known by results of Turner, Dyer 
and Frieze, and others, that such graphs are easy to fc-color. We describe here an algorithm 
to fc-color graphs generated in this way for a much wider range of edge probabilities (p > 
n~ 1+e for any constant e > 0) than previously possible. 

To study a wider variety of graph distributions, we also present a model of graphs gen- 
erated by the semi-random source of Santha and Vazirani that provides a smooth transition 
between the worst-case and random models. In this model, the graph is generated by a 
"noisy adversary" — an adversary whose decisions (whether or not to insert a particular 
edge) have some small (random) probability of being reversed. We show that even for quite 
low noise rates, semi-random fc-colorable graphs can be colored with high probability using 
just k colors. 

Finally, we use assumptions about the worst-case difficulty of approximate graph col- 
oring to provide lower bounds for other hard problems. Using techniques developed by 
Berman and Schnitger, we show that if there is no polynomial-time algorithm to color 
fc-colorable graphs with O(logn) colors, then the largest independent set in a graph (or 
equivalently the largest clique) cannot be approximated to within a factor of n l ~ c for any 
constant e > 0. This is a much higher lower-bound than achieved by previous results, albeit 
based on less solid assumptions. 

Thesis Supervisor: Ronald L. Rivest 

Title: Professor of Electrical Engineering and Computer Science 
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Chapter 1 

Introduction 



A A;-coloring of a graph is an assignment of one of k distinct colors to each vertex in the 
graph so that no two adjacent vertices are given the same color. The chromatic number of 
a graph is the smallest k such that the graph can be ^-colored. 

Graph coloring problems have a long history in mathematics and computer science. 
The famous 4-Color Problem of whether every planar graph is 4-colorable, dates back at 
least to 1852 [33]. Partly through that problem, finally solved by Appel and Haken [3], 
graph coloring has become a central topic in combinatorics. In computer science, graph 
coloring problems have long been known to model various scheduling problems such as 
examination scheduling and register allocation. Graph coloring is also closely related to 
other combinatorial problems such as finding the maximum independent set in a graph (the 
largest set of vertices such that no two have an edge between them). 

Unfortunately from the algorithmic point of view, as is well known, the problem of 
determining the chromatic number of a graph is NP- Complete. The problem of deciding 
whether a graph is fc-colorable for any fixed k > 3 is NP-Complete as well. Thus, coloring 
an arbitrary fc-colorable graph with k colors for k > 3 cannot be done in polynomial time 
unless P = NP (for k = 2, 2-coloring is easy). Knowing that the coloring problem is NP- 
hard does not make it disappear, however, and it also does not necessarily mean nothing 
useful can be done. It does mean that as for many other famous hard problems such as the 
Traveling Salesman Problem (TSP) and the Bin Packing problem, researchers attempting 
to find good fast algorithms must consider issues of approximation. 

This thesis concerns the algorithmic problem of finding good approximate colorings of 
graphs for several natural forms of approximation. We focus here on deriving polynomial- 
time algorithms for coloring graphs of constant chromatic number and on improving upon 
previously known algorithmic guarantees. In particular, we both improve upon previous 
guarantees for the number of colors needed in the worst case to properly color fc-colorable 
graphs in polynomial-time, and extend the known classes of graphs for which optimal col- 
orings can be found quickly. We will not be so concerned here with precisely optimizing 
the running time of the algorithms (so long as they are polynomial); instead we focus more 
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on the quality of the approximation. Because 3-chromatic graphs are the simplest and in 
a sense the most fundamental graphs for which optimal coloring is NP-hard, much of this 
thesis will focus on the special case of coloring graphs of chromatic number 3. We then 
describe extensions of these results to graphs of higher constant chromatic number as well. 

1.1 Applications of graph coloring 

Graph coloring problems arise in situations where one would like to assign a small number 
of values to objects under pairwise constraints of the form that object x and object y cannot 
receive the same value. Such situations occur often in various scheduling problems and we 
present a few examples here. 

Example 1: Examination Scheduling. 

Consider the problem of scheduling n final exams into a small number of different 
time slots. One would like to do so in a way such that no student has a conflicting 
schedule: that is, no student has two of her examinations at the same time. Suppose 
we assign one vertex in a graph to each examination and place an edge between two 
vertices if some student is taking both corresponding exams. Then the problem of 
scheduling the examinations into k time slots so no student has a conflict is exactly 
the problem of coloring the corresponding graph with k colors [42] [5]. 

Example 2: Register Allocation. 

A more "real computer science" problem, for which graph coloring techniques have 
actually been used in practice is the problem of register allocation in compilers. Work 
in this direction has been done by several researchers including Chaitin [15], Chaitin et 
al. [16], and Briggs et al. [13]. During compilation, a standard compiler [15] transforms 
the source program into an intermediate language based on a hypothetical machine 
with an unlimited number of fast syntactic (virtual) registers. Since the real machine 
has only a bounded number of registers, the compiler in a "register allocation phase" 
must then map the computed values in the syntactic registers into the true registers of 
the machine (e.g., 17 registers in work of Chaitin et al. [16] or 32 registers in work of 
Chaitin [15]). If the compiler cannot do this exactly, it will be forced to "spill" some 
values into main memory through load and store operations. Because the registers 
are fast, the hope is to spill as little of the computation as possible. 

The relationship of this problem to graph coloring is as follows. For each procedure in 
a program, Chaitin et al. build a "register interference graph" containing one vertex 
for each value (e.g. a variable in the program) and an edge between two vertices if 
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the two values interfere and cannot be placed into the same register. Interference is 
checked roughly by examining if both values are "live" at the same time, or more 
precisely if one value is live at a definition point of the other. Thus, if we think of 
the real registers as colors, an assignment of 17 or 32 colors to the vertices of the 
interference graph (depending on which machine is being used) corresponds to an 
assignment of registers to the computed values of the procedure. 
Of course, it may be that the interference graph cannot be colored with the required 
number of colors. In that case, the uncolored vertices are "spilled" into main memory. 
So, the goal here is to color as many vertices as possible with the given number of 
colors (where "many" may be defined by some additional measure of cost and not 
just sheer quantity). As it turns out, once values are spilled, this requires additional 
vertices, usually of low degree, to be added onto the graph, so the abstraction as 
a standard coloring problem is not quite exact. Nonetheless, simple graph coloring 
heuristics appear to work well in practice [15][16][13]. 

1.2 Forms of approximation, and past work 

For the graph coloring problem, the issue of approximation splits naturally into two general 
directions. One direction is to consider worst- case graphs, but allow the number of colors 
used to be non-optimal. In particular, one would like answers to the question: 

Given an n-vertex fc-colorable graph, how many colors do you need in order to 
color the graph in polynomial time? 

A second general direction is to relax the restriction that the graph be worst case and 
attempt to find optimal colorings for large or nicely characterized subsets of the inputs. 
That is, one would like answers to the question: 

While coloring fc-colorable graphs with k colors in the worst-case is hard, can 
you find a large subset of cases where ^-coloring is easy? 

1.2.1 Approximate coloring in the worst case 

For graphs of constant chromatic number, the first nontrivial result along the first direction 
presented here was due to Wigderson [43]. Wigderson gives an algorithm to color any n- 
vertex 3-colorable graph with 0(y/n) colors, and more generally to color any fc-colorable 
graph with 0(n 1_ ^i) colors. More recently, several researchers: Berger and Rompel [6], 
Linial, Saks, and Wigderson [24], and Raghavan [32] independently have improved upon 
this bound to color A;-colorable graphs with Oun/logn) 1- *^) colors, which for k — 3 
results in a coloring of 3-colorable graphs with 0(- v /n/\/logn) colors. 
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The result of Berger and Rompel, et al. was important because no progress had been 
made for some time and it showed that y/n was in no sense a lower bound for color- 
ing 3-colorable graphs. However, for the kinds of techniques used it was clear that, say, 
0(y/n/ log 2 n) colors would be completely out of reach. The difficulty in improving these 
results motivated work of Linial and Vazirani [25] who provide some evidence for an n c 
lower bound for the general chromatic number approximation problem. 

For general graphs of arbitrary chromatic number, the best algorithmic result known to 
date is due to Halldorsson [22]. Halldorsson 's algorithm has a performance guarantee — that 
is, a ratio of the number of colors used to the chromatic number — of O(n(loglogrc) 2 /(logrc) 3 ). 
This result is based upon an algorithm by Boppana and Halldorsson [12] for the Independent 
Set problem which finds an independent set within an n/(logn) 2 factor of the maximum. 

There has also been recent work on coloring graphs presented in an on-line manner; that 
is, coloring graphs presented one vertex at a time in some arbitrary order. Vishwanathan [41] 
presents an algorithm for such a model that uses a number of colors within a logarithmic 
factor of the Wigderson bound. 

1.2.2 Exact coloring in special cases 

Many classical results on graph coloring can be thought of from the point of view of the 
second direction described here. These results prove nice characterizations that are sufficient 
conditions for &-colorabihty, and the characterizations are often testable in polynomial time. 
For example, the famous 4-Color Problem and Theorem gives an easy way to prove a graph 
to be 4-colorable — one simply checks that the graph is planar. In fact, the 4-Color Theorem 
of Appel and Haken is known to yield a polynomial-time coloring algorithm for such graphs. 
Of course, if the graph turns out not to be planar, then this technique says nothing about 
the graph's chromatic number. For graphs of chromatic number 3, similar classical results 
are known. Grotzsch ([5], p. 355) proved that any planar graph without triangles must be 
3-colorable, and this was extended to hold for graphs with at most 3 triangles by Griinbaum 
[21]. 1 The proofs of both results involve reducing a graph to one with fewer vertices in ways 
that yield polynomial-time coloring algorithms. For graphs of general chromatic number, 
Brooks' Theorem [14][26] states that any connected graph of maximum degree d (d > 2) is 
either d-colorable or else is a single (d + l)-clique. Note that it is very easy to d-color any 
graph with maximum degree d - 1: for each vertex in an arbitrary order, simply give to 
that vertex any color in {l,...,d} not held at the time by any of its neighbors. Steinberg 
[37] presents a survey of such classical results, focusing on 3-chromatic graphs. 



1 In some sense this is "best possible" since the 4-clique K t is planar with four triangles and is not 
3-colorable. See [5]. 
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Instead of presenting such explicit characterizations of easy-to-color families of graphs, 
one can also study random families of graphs. Turner [38], Kucera [23], and Dyer and 
Frieze [18] give polynomial-time algorithms that color random ^-colorable graphs with k 
colors with high probability, for any constant k. So, most ^-colorable graphs are easy to 
it-color. In fact, Dyer and Frieze go further and provide an algorithm that when amortized 
over all n-vertex A;-colorable graphs, spends on average polynomial time per graph. Petford 
and Welsh [31] present experimental work using heuristics for coloring random 3-colorable 
graphs and claim success for a wide range of edge probabilities. 

It is not known how to color general random graphs (where we do not restrict the chro- 
matic number) in polynomial-time with the minimum of colors, but one can get fairly close. 
For the model G{n,p) of an n-vertex graph in which each edge is included with probability 
p, Bollobas [10] has shown that the chromatic number will be (1 + o(l)) n/(2\og b n) with 
high probability, for b = -^r- Jt is not nard to show that the greedy algorithm: in some 
order give to each vertex the color of least index not yet held by any of its neighbors, finds 
a coloring of at most (1 -f o(l))n/log 6 n colors, a factor of 2 above optimal. Matula [27] 
provides quasi-polynomial approaches with provably better bounds. 

1.3 New results and a plan of the thesis 

This thesis presents results in both of the two directions discussed above. For ^-colorable 
graphs for constant k, we both provide better approximation guarantees for the worst-case 
problem and expand the classes of graphs for which optimal coloring is known to be easy. 

The major portion of this thesis concerns the the first direction discussed of finding 
improved approximation guarantees for the worst-case problem. We present an algorithm 
that uses a quite different strategy from that used by the algorithms of Wigderson and 
Berger and Rompel and others, and colors any 3-colorable graph with O(7i 3/8 log 5 ' 2 n) colors. 
Thus, we improve the previous bound of O( v / n/ v / logn) colors and break a "soft-0( v /") 
barrier" (that is, ignoring polylogarithmic factors). The algorithm we present also extends 
to graphs of higher constant chromatic number and improves upon the previous bounds for 
such graphs. We present the new algorithm in two parts: the first part (Chapter 4) colors 
3-colorable graphs with 0(n 2/5+o(1) ) colors, and the second part (Chapter 5) achieves the 
better bound claimed above. The strategy used also suggests a plausible path for further 
significant reductions in the color bounds, and a discussion of this is given in Chapter 
10. The algorithms presented for the worst-case problem are motivated by techniques that 
would work if the graph were in fact chosen randomly, and this motivation and the general 
flavor of the algorithms are given in Chapter 3. 

Along the second direction, we extend the class of randomly chosen fc-colorable graphs for 
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which a ^-coloring can be found in polynomial time. In particular, we consider a standard 
model of a random A;- colorable graph in which vertices are first randomly assigned to one of 
k color classes and then each edge between two vertices of different color is placed into the 
graph with probability p. For this model, we are able to find colorings for a wider range of 
edge probabilities (p > n~ 1+e for any constant e > 0) than was previously known. These 
results are described in Chapter 7. 

While the known results on random graphs imply that most fc-colorable graphs are easy 
to it-color, random A:-colorable graphs tend to be of a very special type. For example, with 
high probability all vertices of a random fc-colorable graph have nearly the same degree and 
vertices of the same color class all have nearly the same number of common neighbors. So, 
graphs created in only a "somewhat random" manner may not be colored well by algorithms 
for the random case. To explore a wider variety of graph distributions, we present in Chapter 
8 a model of graphs created by the semi-random source of Santha and Vazirani [34] that 
provides a smooth transition between the worst-case and random models. In this model, 
the graph is generated by a "noisy adversary" — an adversary whose decisions (whether 
or not to insert a particular edge) have some small probability of being reversed. We show 
that even for quite low noise rates, these semi-random A;-colorable graphs can be colored 
with high probability using just k colors. The discussion of random and semi-random graph 
models is based in part on work joint with Joel Spencer. 

In addition to the above-mentioned general directions, we describe in Chapter 9 how 
hardness assumptions for approximately coloring graphs in the worst case can be used to 
provide lower bounds for other hard problems. In particular, we use a technique developed 
by Berman and Schnitger [7] to prove the following result. Suppose there were a polynomial- 
time algorithm to find an independent set in a graph of size at most a factor of n 1_£ smaller 
than the size of the largest independent set, for some constant e > 0. Then one could 
convert such a procedure into one that colors fc-colorable graphs with O(log n) colors, for any 
constant k. Also, one could convert such a procedure into one that colors (log n)-colorable 
graphs with polylog(n)-colors. This contrasts with the best algorithm known to date [22] for 
coloring (log n)-colorable graphs which uses more than n/(logn) 2 colors. So, these results 
imply that a seemingly small improvement in approximating independent sets implies one 
can get a much larger improvement for approximate graph coloring. In contrapositive 
form, these results present a high lower-bound for Independent Set approximation based 
on a hardness assumption for graph coloring that is quite far from the best algorithmic 
guarantees currently known. 

Some of the work in this thesis has previously appeared in extended abstract form [8] [9]. 



Chapter 2 

Notation, definitions, and previous algorithms 



In this chapter we review some standard graph-theoretic definitions and introduce basic 
notation that will be used throughout this thesis. At the end of the chapter we will describe 
some previous worst-case coloring algorithms in order to introduce a few useful techniques. 
Given a graph G, let V(G) denote the vertices of G and E(G) denote the edges of G. 
We will use N(v) to denote the neighborhood of a vertex v and d(v) to denote the vertex 
degree. That is, for G = (V, E): 

• N(v) = {w EV \(v,w) £ E}, and 

• d(v)= \N(v)\. 

It will also be convenient to define the degree D(S) of a set of vertices S by: 
. D(S)=J2d(v), 

and the neighborhood N(S) of set S by: 

• N(S) = \J N(v) = {w e V | (v, w) € E for some v G S}. 

Notice that D(S) may be much larger than |iV(5')| if vertices in S share many neighbors in 
common. We will also use the term "distance-2 neighbors" of a vertex v to mean the set 
N(N(v)). Note that if N(v) / </> then v £ N(N(v)). 

An independent set in a graph is a set of vertices no two of which are adjacent to each 
other. A vertex cover is a set W such that every edge in the graph has at least one endpoint 
in W; that is, it is a set W such that V - W is independent. 

As mentioned in the introduction, the chromatic number of a graph is the least number 
of colors needed to color the graph so that no two adjacent vertices are given the same 
color. As is standard terminology [29], we will say that a graph is k-chromatic to mean 
that the chromatic number is exactly k, and that a graph is k-colorable to mean that the 
chromatic number is at most k. For the most part, this distinction will not be important 
and we will use the terms interchangeably. We say that an algorithm t-colors a graph if it 

12 
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colors the graph with at most t colors, and it optimally colors a graph if it colors with the 
fewest number of colors possible. 

For the special case where G is a 3-colorable graph, we use red, blue, and green to denote 
the colors of vertices in G under some legal (but unknown) 3-coloring. We also use these 
terms to denote the sets of vertices belonging to each color class under that legal coloring. 

For functions / and g we say g(n) = 0(f(n)) to denote that g(n) = 0(/(n)log c n) for 
some constant c. Similarly, we will use g{n) = fl(f(n)) to denote that g(n) = fi(/(n)/ log c n) 
for some constant c. We also use "g(n) > /(n)" to mean that f(n) = o(g(n)). Finally, we 
use the following general standard notation: 

• (m)i = m(m — l)(m — 2) ■ ■ -(m — i + 1). 

• K t is the clique on t vertices. 

• For S a subset of vertices of graph G, the graph H = G\ s is the subgraph of G induced 
by set S. That is, V(H) = S and E(H) = {(i,j) G E(G) | i,j 6 S}. 

The term "logn" will be used to denote log 2 n, and log p n will be used to denote (logn) p . 

2.1 Previous algorithms 

As is well known, 2-colorable graphs can easily be 2-colored in polynomial time. For exam- 
ple, the following procedure suffices to color any 2-colorable graph with the colors and 1. 
First, assign a color, say 0, to one vertex in each connected component in the graph. Then 
assign color 1 to each neighbor of a vertex colored 0. Finally, repeat, assigning color to 
any uncolored neighbor of a vertex of color 1, and color 1 to any uncolored neighbor of a 
vertex colored 0, and so on, until the entire graph is colored. The resulting coloring will be 
legal since 2-colorable graphs have no odd cycles. 

Let us now review Wigderson's algorithm [43] for the special case of 3-colorable graphs. 
Wigderson's algorithm looks at the immediate neighborhoods of vertices, and uses the fact 
that in a 3-colorable graph the neighborhood of any vertex is 2-colorable. The algorithm 
proceeds as follows. If there exists a vertex of degree at least y/n in the graph, then we 
color its neighborhood with two unused colors and then delete the colored nodes from the 
graph. If all vertices have degree less than -y/n, we can greedily -y^n-color the remaining 
graph, since with y/n colors, for each vertex we are guaranteed that at least one color is not 
used on its neighbors. The total number of colors used is at most 3y/n. If we pick a degree 
cutoff of \/2n instead of y/n, we can optimize the constant for this type of strategy to V8. 
A more formal description of the algorithm is given below. 
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Wigderson's Algorithm 

Given G = (V, E), a 3-colorable graph on n vertices. 

1 . Initialize color c to 0. 

2. While there exists a vertex v £ V with d(v) > y/n, 
(a) 2-color N(v) with colors: c, c+ 1. 
(b)Letc^c + 2, V<-V-N(v). 

(note that the loop in this step can be executed at most y/n times.) 

3. Color the remaining graph with colors c,c + 1, . . .,c + y/n - I, by arbitrarily 
assigning to each vertex a color not held by any of its neighbors. 

The improvement to O(y/n/y/logn) of Berger et al. mentioned previously results from 
choosing (log n) starting vertices instead of one. This can be done by selecting an arbitrary 
subset of vertices of size (3 log n), and trying each subset of size (log n); one such subset must 
be monochromatic under some legal 3-coloring of G and so has a 2-colorable neighborhood. 
The way that this set is then exploited is described in [6]. We will revisit this algorithm in 
Chapter 3, where the algorithm and bounds guaranteed follow as an easy corollary of the 
machinery described there. 

In contrast to the above strategies, the new worst-case algorithm presented here is a 
multi-pronged attack. The main idea of the new approach is to take advantage of informa- 
tion from not just the immediate neighbors of vertices, but from distance-2 neighbors as 
well. One difficulty with looking at distance-2 neighbors is that they have not so obvious 
a structure as the immediate neighbors. For example, the immediate neighborhood, as 
noted earlier, is 2-colorable; the structure of the distance-2 neighbors will have to be more 
carefully brought out. 



Chapter 3 

Worst-case bounds: preliminaries 

3.1 New worst-case approach: the basic idea 

The previous best algorithms for coloring 3-colorable graphs all used 0(n 1 ' 2 ) colors in the 
worst-case. This section describes the basic idea for an algorithm to color any n-vertex 
3-colorable graph G with d(n a ) colors, for some a < 1/2. Note that to do so, it is enough, 
as in Wigderson's algorithm, to find an independent or 2-chromatic set of size (l(n 1 ~ a ), 
since that set can be colored with 1 or 2 colors and the procedure repeated on the graph 
remaining. 

The idea of the new algorithm is to try to make progress from examining distance-2 
neighbors, and not just the immediate neighborhoods of vertices as in previous algorithms. 
We will describe the motivation for the approach by considering the question: "what if 
the edges in the graph were distributed randomly?" That is, what if after an adversary 
decided which nodes to place in the sets red, blue, and green (the color classes under a legal 
3-coloring unknown to the algorithm) a coin of some bias p was then flipped for each pair 
of vertices u, v of different colors to determine whether edge (u, v) would be in the graph? 
In that case, the following strategy finds an independent set of size Q,(n 2 ^ 3 ). 

First, we may assume there are about the same number of red, blue, and green vertices, 
since otherwise we could immediately separate at least one of the color classes from the 
others by just looking at the vertex degrees. 1 Second, we may assume that the vertices have 
average degree at least n 1 ? 3 , since otherwise we could just greedily gather an independent 
set of size ft(n 2 / 3 ). Finally, for simplicity, we assume that the average degree d is at most 
n i/2-c f or some e > (so we have n l l 3 < d < n 1/,2 ~ e ). This last requirement will simplify 
the motivational argument, but is not necessary. 

Suppose v is a red vertex. Then, the neighborhood of v consists of blue and green 
vertices, with approximately half of each color if the numbers of blue and green vertices 
in the graph are roughly equal. Each blue vertex in N(v) similarly has about half green 



'Once we have separated one of the color classes from the others, we can then easily 2-color the graph 
remaining. This fact about the sizes of the color classes for random graphs does not generalize to worst-case 
graphs, and in fact, there is no analog of this step used in the worst-case algorithm. It is inserted here solely 
to simplify our picture of the graph. 

15 
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neighbors and half red neighbors, and each green vertex has about half blue neighbors and 
half red neighbors. So, if we look at the set of the distance-2 neighbors S = N(N(v)), red 
vertices are significantly more predominant than blue or green vertices. In fact, about half 
of S is red, a quarter blue, and a quarter green, since we have assumed d is small enough (at 
most n 1 / 2-6 ) that not many vertices of S are neighbors of several vertices of N(v). Thus, 
S is a set of size at least fi(rc 2/3 ) that has within it an independent set (the red vertices) of 
about one half the size of S. 2 

Given a set S of size Sl(n 2/3 ) containing an independent set of size \\S\, and therefore 
a vertex cover of size \\S\, we can algorithmically find an independent set of size 0(n 2 / 3 ) 
by applying a vertex-cover approximation algorithm due to Bar- Yehuda and Even [4] and 
(independently) to Monien and Speckenmeyer [28] . 3 Their algorithm finds a vertex cover 
of size at most (2 - ' fc' ^ " ) times the size of the minimum vertex cover in the graph. If we 
apply the algorithm to the graph induced by S, we find a vertex cover W in S of size at most 
\\ S \ ( 2 ~ ]2 Sr)' which is at most l 5 l - 1^*1/(4 log l-S'l)- So, the complement, S - W, is an 
independent set inside S of size at least n(|S|/log|S|) = fi(n 2/3 ). Thus, in the case where 
the edges in the graph are chosen by a random process, we have found a large independent 
set. In Chapter 7, we see how in fact to do much better for random graphs and actually 
3-color random 3-colorable graphs for p > n^ 1 ) -1 (i.e., where the average degree is at least 
n f for some e > 0). 

Worst-case graphs, however, are not random. Instead, we will use various techniques 
to force the graph to have properties of random graphs, or at least weak versions of these 
properties, that we need. One such property is that of being "well-distributed": we want 
N(N(v)), or at least an easy-to-select subset of N(N(v)), to have nearly half red vertices, 
so that the vertex-cover approximation algorithm can be used. The second such property 
is an expansion property: we want the selected subset of N(N(v)) to be significantly larger 
than N(v), so that our performance is much better than that achieved by looking only at 
immediate neighbors. 

Chapters 4 and 5 describe one general method for proving the existence of a form of 
good distribution in worst-case graphs and two methods for forcing expansion. The first 
method for forcing expansion (described in Chapter 4) is simple and elegant and results in 
a coloring of any 3-colorable graph with 0(n 2/5 ) colors; the second (described in Chapter 5) 
is more complicated, but results in an improved bound of 0(n 3 / 8 ) colors. 



2 We can remove the restriction d < n 1/2_e by choosing 5 to be a subset of N(N(v)) generated by 
conceptually deleting edges from the graph at random until the average degree is below n 1 ' _e , and then 
letting 5 = N(N(v)) in this new graph. 

3 Their algorithms differ slightly but the bounds are essentially the same. A version of their algorithm is 
described in Appendix A for completeness. 
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3.2 A few additional definitions 

We now present a few additional definitions that will be needed in Chapters 4 and 5. Given 
a graph G = (V, E) on n vertices: 

• For v e V, let d T (v) = \N(v) D T\. We call d T (v) the degree into T of v. 

• For S,T CV, let D T (S) = ]T d T (v). We call D T (S) the decree into T of 5. 
Note that d T (v) = D {v} (T) and D T (S) = D S (T). 

• Let S = 6(n) = j^. 

• Let I, = {v e V | d(t>) G [(l + 6y,(l + £y +1 )} forj = 0,1,2,.... That is, we divide 
the set of vertices of degree at least 1 into bins Ij so that in each bin, the ratio of 
the degrees of any two vertices is less than (1 + 6). The number of bins is at most 

l°gi+« n < (1 + K 1 ))! ln n < 1 lo S n - 

• For 5 C V, let 7V,(5) = {v e iV(5") | rf 5 (f ) G [(1 + «)', (1 + <5) i+1 )} for * = 0,1,2,.... 
In other words, Ni(S) (0 < i < log 1+i n) is the subset of vertices in N(S) that are hit 
by at least (1 + 6)' and less that (1 + <$) ,+1 edges from S. 

3.3 Useful definitions of progress 

In order to more easily describe and analyze the coloring algorithms presented, it will be 
useful to have several formal notions of "making progress" towards an /(n)-coloring of an 
n-vertex graph. These notions simplify the analysis by allowing us to aim for intermediate 
goals. While we will only need to consider f(n) a function of the form 0(n a log' 5 n), the 
notions of progress in fact hold for a more general class of "nearly-polynomial" functions, 
as defined below. 

Definition 3.1 A function f over Z + is nearly-polynomial if it is non- decreasing and 
there exist constants c,c' > 1 such that for all sufficiently large N , 

/(27V) > cf(N) and /(27V) < c'f(N). 

For example, if /(n) = n 1/2 , then we may choose c - c 1 = 2 1/2 . If f(n) - n a log' 3 n for 
a > 0, then we may choose c = 2 a (l - e) and c' = 2 a (l + c) for any constant e > 0. 

Three important ways of making progress towards an /(n)-coloring of an n-vertex k- 
colorable graph are defined as follows. 
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Progress Type 1: [Large-IS] Find an independent or 2-colorable 4 set S of size Sl(n/f(n)). 

Progress Type 2: [Small-Nbhd] Find an independent or 2-colorable set 5* such that |iV(5')| = 
0(/(n)|5|). 

Progress Type 3: [Same-Color] Find two vertices that must be the same color under any 
legal fc-coloring of the graph. 

Progress Type 1 "makes progress" because we can color the set found with at most two 
colors, throw away the colored vertices, pick two new colors to work with and continue. The 
idea for progress Type 2 is that we can use it to find many different 2-colorable sets, each of 
which is independent of the others because each set has a small neighborhood; combining 
the sets found gives us a large 2-colorable set and thereby progress of Type 1. Progress Type 
3 always helps us towards any approximate coloring. More formally, besides showing that 
each type of progress is useful individually, we would like to say that any combination of the 
three types of progress, in any order, yields an 0(/(n))-coloring of an n-vertex fc-colorable 
graph. 

Lemma 3.1 If there exists a polynomial-time algorithm A that is guaranteed given any k- 
colorable graph of m vertices, to make progress of either Type 1, 2 or 3 towards an 0(f(m))- 
coloring (where f is nearly-polynomial), then there exists a polynomial-time algorithm B that 
colors any n-vertex k-colorable graph G with 0(f(n)) colors. 

Progress Type 1 and a weaker variant of Type 2 were used by Wigderson [43]. In fact, 
if we do not care about constants, we can state Wigderson 's algorithm for coloring n-vertex 
3-colorable graphs with 0(n 1/ ' 2 ) colors as follows. If a vertex v has a neighborhood of 
size Q(n 1 / 2 ) then we make progress of Type 1 using its neighborhood; otherwise, |iV(v)| — 
0(1 • n 1 / 2 ) so we make progress Type 2. 

We can also state simply the algorithm of Berger and Rompel [6] to color any 3-colorable 

graph with 0{y/n/y/\ogn) colors using these types of progress (here, /(n) — y/n/^ogn). 

Select a subset 5 of 3 log n vertices in graph G arbitrarily and examine every independent 

subset S of S of size (logn). Note that there are at most ( 3 ,o° g „") < n 3 such subsets, so 

this can be done in polynomial time. For each subset S, test to see if its neighborhood is 

2-colorable; this test will succeed for some S since at least one such subset must consist of 

vertices all the same color in some legal 3-coloring of G. Now, if 1^(5)1 > \/n\/log n, we 

have made progress of Type 1. If | JV(5)| < -^/ny/logn, then we have made progress of Type 

2. 

technically, an independent set is 2-colorable. We list both here to emphasize there is no need for the 
set S to require 2 colors. Also, we label this type of progress by "LARGE-IS" since given a 2-chromatic set, 
one can easily find an independent subset of only a factor of 2 smaller. 
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We now prove Lemma 3.1, showing that these types of progress really do "make progress". 

Proof of Lemma 3.1: First, if algorithm A ever makes progress of Type 3 [Same-Color] 
on a subgraph of G, then since the two vertices u and v found must be the same color under 
any it-coloring of the subgraph, they also must be the same color under any fc-coloring of G. 
So, we can just merge the vertices u and v into a new vertex with neighborhood N(u)UN(v) 
and start again from the beginning: in doing so, we remove one vertex from G and use no 
colors. Thus, we may assume from now on that A only makes progress of Types 1 or 2 
when applied to any subgraph of G. 

Claim: If for some constant e > we can always find a 2-colorable set of size em/f(m) 
in a ib-colorable graph of m vertices, then we can achieve an 0(/(n))-coloring of G as follows. 
We find such a set in G, color it with two colors, remove those vertices from the graph, and 
repeat. 

Proof of Claim: The proof is just a straightforward calculation given below. The 
number C(m) of colors used satisfies C(m) < 2 + C (m - cm//(m)). Since / is a nearly- 
polynomial function, for each m' in the range [m/2,m], we have: 

C(m') < 2 + C(m' -em'/f(m')) 

< 2 + C {m! — e(m/2)//(m)). (because / is non-decreasing) 

Applying this last inequality f(m)/€ times, we get C(m) < 2/(m)/e + C(m/2), which 
implies 

C(m) < f[/(m) + /(m/2) + ... + /(l)] 

< 2/( ro )[l + I + i + i + ... + 0(l)] 



< 



(since f(n) > c/(n/2) for n large enough) 

2c _/| 

/(m) 



0(1) 



e(c-l) 
= 0(f(m)). □ (End proof of claim.) 

Thus, to prove the lemma, we just need some algorithm B' that on any fc-colorable graph 
of m vertices finds a 2-colorable set of size fi(m//(m)). Algorithm B' works as follows. 

On input (V,E), where m = |V|, 

1. Initialize set U to the empty set and initialize V to V . 

2. While \V'\ > m/2 do: 

(a) Let (V, E') be the subgraph induced by the vertices in V. Run algorithm A on 

(V',E>). 
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(b) If .4 returns with progress of Type 1 [Large-IS], then since \V'\ > m/2, we have 
a 2-colorable set of size 0( /( t ^ / / 2 2) ) = fi(m//(m)) (since / is nearly-polynomial), 
so halt and output that set. 

(c) If ^4 returns with progress of Type 2 [Small-Nbhd], let S denote the set returned 
by A- Now, update: 

U <- UliS 
V «- V -(SuN(S)). 

Notice that in this step, each time we add vertices to U, we remove all their 
neighbors from V. So, we maintain the invariant that U has no neighbors in V. 

3. Halt and output U. 

If we reach step 3 in the above algorithm, it must be that at that point, \V'\ < m/2. 
Set U is a 2-colorable set since each set S added to U in step 2(c) is 2-colorable and by 
the invariant mentioned in 2(c), the sets S are all independent of each other (thus, we may 
use the same 2 colors on each set S). Set U is also large because for each set S of size 
r found in 2(c), we add r vertices to U and remove at most r + trf(m) vertices from V 
for some constant t by the definition of progress Type 2 [Small-Nbhd]. 5 Thus, \V - V'\ is 
at least m/2 and \V - V'\ is at most \U\ + t\U\f(m). Combining the two inequalities, we 
find 1 17 1 + t\U\f(m) > m/2, which implies \U\ = fi(m//(m)). This large 2-colorable set is 
exactly what we needed from algorithm B' . m 

By Lemma 3.1, we now may just aim for progress of one of the three types in our coloring 
algorithms. This fact will simplify the statements and correctness proofs of algorithms 
presented in Chapters 4, 5, and 6. 

Also, as a simple application of these types of progress, note that progress Type 2 
[Small-Nbhd] can be used to guarantee that for each vertex v, the set N(N(v)) has size 
fi(/(n) 2 ): we make progress if |iV(v)| < f(n) since {v} is an independent set and make 
progress if \N(N(v))\ < f(n)\N(v)\ since N(v) is 2-colorable. Thus, we get the following 
corollary. (We assume here that / is nearly-polynomial.) 

Corollary 3.2 If G is an n-vertex 3-colorable graph such that \N(N(v))\ = 0(f(n) 2 ) for 
some vertex v, then we can make progress towards an 0(f(n))- coloring of G. 



5 Here we use the fact that / is non-decreasing. 



Chapter 4 

Worst-case bounds for 3-colorable graphs: first 
algorithm 



In this chapter, we describe an algorithm to color any n-vertex 3-colorable graph with 
0(n OA ) colors. As mentioned in the last chapter, the algorithm consists of two major parts. 
First, we force the graph without loss of generality to have a useful expansion property. 
Second, we find and take advantage of a form of good distribution of edges that we show 
must exist in any 3-colorable graph. Some of the theorems we prove, in particular those in 
Section 4.3 concerning the distribution property, hold more generally for graphs constrained 
only to have large independent sets. This fact will be useful for us later in Chapter 6 for 
extending these techniques to graphs of higher chromatic number. 

4.1 Forcing expansion 

In this section, we show that if our goal is to color a 3-colorable graph G with 0(f(n)) 
colors, where / is a nearly-polynomial function as in Definition 3.1, then we may assume 
without loss of generality that no two vertices share more than n/[f(n)] 2 neighbors. So, 
for example, if we wish to color with 0(n a ) colors, we may assume for all u, v £ V, that 
\N(u) D N(v)\ < n 1_2or (for a = 0.4, the shared neighborhood may have size at most n° 2 ). 
This is our first method for forcing expansion in the graph. 

Bounding the number of neighbors that may be shared by two vertices forces expansion 
in the following way. Suppose we wish to color with n a colors. If we look at the neighborhood 
of some vertex v and consider an arbitrary subset of to -f d(v) edges leaving N(v), then 
we may assume those edges enter into at least m/n l ~ 2c " other vertices. The reason is that 
otherwise, some vertex w ^ v must have more than n 1_2a neighbors in N(v). This fact will 
be useful when we show in Section 4.3 how to find such a set of m edges whose endpoints 
contain an easy-to-find independent set. 

Given the three methods for making progress defined in the last chapter, this method 
for forcing expansion falls out easily. Throughout this section, we assume / is a nearly- 
polynomial function. 
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Theorem 4.1 If G is an n-vertex 3-colorable graph containing vertices u and v such that 

\N(u)nN(v)\ = il(n/[f(n)f), 

then we can make progress of Type 1, 2, or 3 towards an 0(f(n))- coloring of G. 

Proof: Suppose u and v are two vertices that share a neighborhood S = N(u) fl N(v) 
of size fi(n/[/(n)] 2 ). Clearly, S is 2-colorable since it is a subset of the neighborhood of u. 
So, if |JV(S)| < n//(n), then we have made progress Type 2 [Small-Nbhd]. On the other 
hand, if |A r (5')| > n/f(n) and N(S) is 2-colorable, then we have made progress of Type 1 
[Large-IS]. The last possibility is that N(S) is not 2-colorable (and that it is large, but we 
will not need this fact). But, this last case means that u and v must be the same color 
under any legal 3-coloring of G. The reason is that if u and v could possibly be different 
colors under some legal 3-coloring (say blue and green) then S would be monochromatic 
(red), so N(S) would be 2-colorable (blue and green). So, if our attempt to 2-color N(S) 
fails, then we make progress of Type 3 [Same-Color]. ■ 

We can use the same argument as above to guarantee without loss of generality that 
a selected set S of size Sl(n/f(n) 2 ) in G is not monochromatic under any legal 3-coloring 
of G. In particular, suppose S were monochromatic, so N(S) is 2-colorable. Then, if 
|iV(5)| > n/f(n) we make progress Type 1 [Large-IS], and if 1^(5)1 < n/f(n) we make 
progress Type 2 [Small-Nbhd]. So, we get the following corollary. 

Corollary 4.2 Given an independent set S of size il(n/ f(n) 2 ) in an n-vertex 3-colorable 
graph G, we can either make progress towards an 0(f(n)) coloring of G or else guarantee 
that the vertices of S are not all the same color under any legal 3-coloring of G. 

While this corollary is not be immediately useful for us here, an improved, more com- 
plicated method for forcing expansion (described in Chapter 5) consists in part of an im- 
provement to this corollary, and leads to better coloring guarantees. 

4.2 The algorithm 

We now describe the algorithm for coloring n-vertex 3-colorable graphs with 0(n 2/ ' 5 log ' n) 
colors. As mentioned in the last chapter, the algorithm uses a vertex cover approximation 
algorithm of Bar- Yehuda and Even [4] and (independently) Monien and Speckenmeyer [28] 
that finds a vertex cover of size at most (2 — '°^° s n " ) times the size of the minimum vertex 
cover in a graph. We will call their algorithm the BE/MS algorithm. A simpler version of 
their procedure for the special case in which it is used in this thesis is given as Algorithm 
Approx-IS in Appendix A. 



4.2. The algorithm 23 

Algorithm First-Approx: 

Given: G = (V,E), a 3-colorable graph on n vertices. Let f(n) = n 2/5 (log7z) 8/ ' 5 . 
Output: Progress of Type 1, 2, or 3 towards an O(n 2/5 (log n) 8/5 )- coloring ofG. 

1. [Min degree] For each vertex v, if d(v) < f(n), make progress Type 2 [Small-Nbhd]. 

2. [Expansion] For each pair of vertices u,v, if \N(u) f~l N(v)\ > n/[f(n)] 2 , then 
make progress using Theorem 4.1. 

3. [Dist-2 Neighbors] Otherwise, for each vertex v, for each i, j G {0, 1, . . . , 5 log n}: 

Let T ViiJ = N i (N(v)rM j ). 

(Recall the definitions of Section 3.2.) 

4. [VC approx] Run the BE/MS Vertex-Cover approximation algorithm on each 
T Vii j. If we find an independent set of size n(n 3/5 /(log7i) 8/5 ), we have made 
progress Type 1 [Large-IS]. 

The next two sections are devoted to proving the following theorem. 

Theorem 4.3 (Main Theorem) Algorithm First-Approx makes progress of Types 1, 2, or 
3 towards an 0(n 2 ^ 5 (\ogn) 8 ^ 5 )-coloring of any n-vertex 3-colorable graph. 

Using Lemma 3.1 (the usefulness of making progress), we get the following corollary. 

Corollary 4.4 There exists a polynomial-time algorithm that will color any 3-colorable n- 
vertex graph with O(n 2/ ' 5 (logn) 8/ ' 5 ) colors. 

Let us calculate the running time of the coloring algorithm. The BE/MS algorithm runs 
in time O(NM) on any iV-vertex graph with M edges. We may assume for simplicity that 
the graph in Step 4 of algorithm First-Approx has size at most n 3 ? 5 else we just remove excess 
vertices at random. So, the running time of algorithm First-Approx, which is dominated by 
Steps 3 and 4, is at most: 

[(n vertices) • (log 2 n j's) • (log 2 n i's) in Step 3] X [n 3 / 5 (n 3 / 5 ) 2 for vertex cover in Step 4] 

= 0(n 14 / 5 ), 

which is polynomial in n. Note that this is the time needed to give one color to (l(n 3 / 5 ) 
vertices. One may have to run the algorithm 0(n 2 ^ 5 ) times in order to color the entire 
graph. 
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4.3 Forcing good distribution 

From the last sections, we know that if we wish to color an n vertex graph with 0(f(n)) 
colors, then we may assume that the graph has minimum degree f(n) (or else we make 
progress Type 2 [Small-Nbhd]) and no two vertices share more than n/[f(n)] 2 neighbors (or 
else we make progress with Theorem 4.1). 

The goal of this section is to show how, given such a graph G, to find a small number of 
subgraphs such that at least one must be both nearly half red under some legal 3-coloring 
of G (at least |(1 - ^-^) of its vertices red), and large (size Cl(f(n) 4 /n) = £2(n 3/5 ) for 
f(n) = £l(n 2 / 5 )). In particular, we will show this holds true for one of a small number of 
subsets of the neighbors of the neighbors of v for some vertex v in the graph. 

We will assume without loss of generality that red is the color in G such that D(red) = 
max (D(red),.D(blue),D(green)). That is, of the three colors, red is the color with the most 
edges incident to vertices of that color. The assumption on red implies that .D(red) > 
i( J D(blue) + £'(green)), so 

£> red (blue U green) > -D(b\ue U green). (4.1) 

Note also that if d is the average degree of the vertices in G, then D(red) > c/|red|. 

4.3.1 The basic approach, and a counterexample to the naive strategy 

In order to find a large subgraph that is nearly half red, the first step will be to find a large 
subset S G blue U green such that nearly half of the edges leaving S enter into red vertices. 
We know that if we look at the entire set blue U green, at least half of the edges leaving 
that set enter into red vertices (equation (4.1)). The problem is: we do not know how to 
find blue U green. We can, however, look at subsets of blue U green by considering vertex 
neighborhoods, many of which (for red starting vertices) will be blue and green. 

Given the property of blue U green described in equation (4.1), one might expect that 
this property would hold for the neighborhood of some vertex as well: that is, that for some 
v G red, we would have D re( j(N(v)) > ^D(N(v)). Unfortunately, this may not necessarily 
be the case, and what follows is a counterexample to this seemingly innocent claim. 

Consider a graph with m red vertices r , . . . , r m _ 1; m + 1 green vertices <7o> • • ■ j <7m> an d 
m + 1 blue vertices 6 , • • -,b m . Vertices g m and b m are two distinguished vertices with large 
degree and twice as many edges into blue or green vertices than into red vertices. The rest 
of the vertices have low degree, but together there are enough edges with red endpoints so 
that D(red) is greater than D(blue) or Z>(green). More specifically, the edges in the graph 
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typical red vertex 




Figure 4.1: A counterexample to the naive strategy. For clarity, only edges incident 
to the distinguished vertices g m and b m , and incident to a typical red vertex are given. 
The four edges between the red vertex and the non-distinguished blue and green vertices 
are shown as dashed lines. 

are: (see Figure 4.1) 

{(9m, r ), (g m , n), • • • , (g m , r-m/2-i)} 

u {{g m , b ), (g m , h), . . . , (g m , 6 m -i)} 

U {(b m , r m/2 ), (b m , r m/2+ i), ..., (b m , r m _ 1 )} 

U {(&m,0o),(&m,5i),--.,(5m,&m-i)} 

U {(9i, »*.■), (5i,^(.+i)modm)} for each < i < m - 1 

U {(bt, r,), (b { , r (l+1)modm ) for each < i < m - 1. 

That is, vertices g m and b m are each connected to a different half of of the red vertices 
and each are connected to all the vertices of index less than m of the remaining color. In 
addition, each Ti is connected to two green and two blue vertices of index less than m. 

So, .D(red) = 5m, jD(green) — (4 + \)m, and D(blue) = (4 + |)"*- But, for each 
red vertex w, we have D re( j(N(v)) — 8 + m/2 and D v _ re <\(N (v)) — 4 -f to, which implies 
D(N(v)) = 12 + 3to/2. So, D re <i(N(v)) is approximately one third of D(N(v)) rather 
than one half. One can also construct variations of this counterexample in which the ratio 
between D re< i(N(v)) and D(N(v)) is even worse. 

The problem here is that the vertices have wildly varying degrees. While one can also 
find variations on this counterexample that hold even when all vertices have degrees in the 
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range [n a ~ c , n a+e ] for any e > 0, if we restrict the vertex degrees extremely tightly then the 
desired property does hold. That is, if the degrees are nearly identical, then there exists 
v £ V such that N(v) has nearly half the edges leaving it entering into red vertices. This is 
the purpose of the bins Ij and is the intuition for Theorem 4.5 below. 

Once we have a set S C N(v) with nearly half the edges leaving it entering into red 
vertices, we again use a similar idea to find a large set inside N(S) which is nearly half red. 
The trick again is to separate vertices according to degree, which is the purpose of the sets 
N t (S). This step is handled by Theorem 4.6. 

4.3.2 Theorems and proofs 

We now describe the theorems that allow the above basic idea and the algorithm First-Approx 
to succeed. These theorems are stated in terms of not-necessarily 3-colorable graphs con- 
taining a large independent set R. (The symbol "#" is used to be suggestive of the set 
red.) 

Theorem 4.5 Given an n-vertex graph G = (V, E) with average vertex degree d, and an 
independent set R such that (1) D R (V - R) > XD(V - R) for some < A < 1 and (2) 
D(R) > d\R\, then for some v £ R and some bin Ij-. 

1. \N{v)r\Ij\ > Pd/\og i+5 n, 

2. D R (N(v)nlj) > X(l-36)D(N(v)nI j ). 

In other words, for some v G R, the set N(v) C\ Ij is a reasonably large fraction of N(v) 
and has almost a fraction A of the edges incident to it going into R. We now look at the 
neighbors of N(v) D Ij and show that for some i, the set iV,(iV(u) D Ij) has the properties 
we need. 

Theorem 4.6 Given an n-vertex graph G = (V, E), a set R C V , and A' G [0, 1]: 

For any set S such that D R (S) > X'D(S), there must exist some i < log 1+4 n such that: 

1. D Ni(synR (S) > 6D R (S)/Qog 1+f n), 

2. \Ni(S)r\R\/\Ni(S)\ > (1-2<5)A'. 

Assuming for now the correctness of Theorems 4.5 and 4.6, we can prove a corollary 
showing why at least one of the sets created in Step 3 of Algorithm First-Approx will both 
be large and contain an independent set of nearly half its vertices (and so be of the right 
form for the vertex-cover algorithm used in Step 4). 
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Corollary 4.7 Given an n-vertex 3-colorable graph G = (V, E) such that (1) no two ver- 
tices share more than s neighbors and (2) G has minimum degree d min > max{s(l + 
6),(3\og 1+s n)/8}, then for some v^V and some i,j G [0,5 log n], the set 

T = N i (N(v)ni j ) 

has at least O ((d min ) 2 /(s\og 7 n)j vertices of which at least a fraction |(1 - -^) are colored 
red under some legal 3-coloring of G. 

Proof of Corollary 4.7: By definition of set red in G, the conditions of Theorem 4.5 
are satisfied for R = red and A = 1/2 (see equation (4.1)). Let vertex v and bin Ij be such 
that claims (1) and (2) of Theorem 4.5 are satisfied for S = N(v) n Ij. By claim (2) of 
Theorem 4.5, set 5 satisfies the conditions of Theorem 4.6 with A' = |(1 - 3<5). Let i be the 
index such that claims (1) and (2) of Theorem 4.6 are satisfied and let T = Ni(S). Then: 

Dthr(S) > SD R (S)/(\og 1+6 n) (Theorem 4.6, claim 1) 

> s[\(l-3S)D(S)\/(log 1+6 n) (Theorem 4.5, claim 2) 

> 6\(l-36)[d min \S\]/(log 1+f n) (ioT<d\v,d(v)>d min ) (4.2) 

> 6 3 X(1 - 36)d 2 min /(log 1+6 nf (Theorem 4.5, claim 1) 

= n(* 5 dL„/(log 2 n)) (using log 1+4 n = 0{\ log n)) 

= fi(dL„/(log 7 n)). (* = infe) 

Since no two vertices share more than s neighbors and S C N(v), we know no vertex w jt. v 
has more than s neighbors in S. Since we have also assumed that d min > s(l + 6), we know 
that the set Ni'(S) containing v contains no other vertices besides v by definition of JV,-. 
Also, since d min > (3log l+s n)/6, by equation (4.2) we have D TrtR (S) > \S\ so we know 
T 7^ {v} and thus v £ T. So, set T consists only of vertices with at most s neighbors in S 
and we have: 

\T\ > D TnR (S)/s 

= J2(C,/(*log 7 n)). 

Also, the fraction of red vertices in T is large: 

|rnfl|/|T| > A(l - 2S)(1 - 36) (Theorems 4.5 claim 2, and 4.6 claim 2) 

> |(1 - 5^) (by definition of red, we have A > 1/2) 

- 2 y 1 ~ I^TtJ ■ 
Thus, set T satisfies both claims of the corollary. ■ 

Before proving Theorems 4.5 and 4.6, we state a simple combinatorial lemma: 
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Lemma 4.8 Given b balls of which r are red, all placed in k boxes, then for any e (0 < e < 
1), there is some box with at least er/k red balls such that the ratio of the number red balls 
to the total number of balls inside that box is more than (1 — e)r/b. 

Proof: Throw out all boxes with fewer than er/k red balls. The minimum possible 
ratio of red balls to total balls left is: (r - er)/(b - er) since at worst we throw out k boxes 
containing only red balls. This ratio is strictly greater than (1 - e)r/b. So, by pigeonholing, 
there must exist at least one box left with a ratio of red balls to total balls at least this 
large. ■ 



Proof of Theorem 4.5: For convenience, we call vertices in the independent set R "red". 
First, we show there exists a good bin. We are given that D R (V — R) > \D(V — R). 
We apply Lemma 4.8 where there is one "box" for each of the log 1+;j n bins Ij. For each 
v e V - R, if v e Ij, we place d(v) "balls" of which d R (v) are red into box j. So, the number 
of balls in box j equals D(Ij f) (V - R)) out of which D R (Ij C\ (V - R)) are red, and the 
number of balls total is D(V - R) of which D R (V - R) are red. Lemma 4.8 tells us, taking 
e = 6, that for some j , if we let / = Ij a PI (V — R), then: 

D R (I) > 6D R (V-R)/(log 1+6 n) and (4.3) 

D R (I) > X(l-6)D(I). (4.4) 

Informally, the set / of non-red vertices has the property that many edges have endpoints 
in / (since D R (I) = &(D(V — R)) by equation (4.3)), that almost a A fraction of the edges 
leaving / enter red nodes (equation (4.4)), and that all nodes in I have similar degrees (since 
/ C Ij ). We do not know how to distinguish between edges with endpoints in R and other 
sorts of edges, so we do not know which Ij contains /, only that such an Ij must exist. 

We now show that for some v £ R, the set N(v)f)I satisfies claims (1) and (2) of Theorem 
4.5. Note that this completes the proof because N(v) l~l [I jo f~l (V - R)] = N(v) H I jo since 
v G R and R is an independent set. 

Define: 

• R! = {v e R : \N(v) n I\ > S 2 d/ log 1+i n}. 

R' is the set of red vertices such that N(v) n / satisfies claim (1) of Theorem 4.5. We first 
show that nearly A of the edges from the set / enter into R' and then use this to show that 
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for some v £ R\ claim (2) of Theorem 4.5 holds. So, from the definition of R' , we have: 

D R ,{I) > D R (I)-\R\Pd/\og 1+6 n 

> D R (I)-D R (V-R)8 2 /\og 1+6 n (since D R (V - R) = D(R) > d\R\) 

> D R {I) - (p R {I){\og l+i n)/6) {PI log 1+4 n) (by equation (4.3)) 

> D R (I)(1-S). 

Finally, applying equation (4.4) we have: 

D R .(I) > \(1-26)D(I). (4.5) 

We now claim that for some v £ R', the set N(v) D / satisfies claim (2) of Theorem 
4.5. Essentially, the reason for this is that all vertices in / have similar degrees. The actual 
proof is by contradiction, using a counting argument. 

Suppose for contradiction that: 1 

For all t; € £', D R .(N(v)nl) < \(l-38)D(N(v)nl). (contr 4.6) 

If this is the case, then it must also be true that: 

J2D w (N(v)nI) < X(l-3S)^2D(N(v)nl). (contr 4.7) 

veR' v£R< 

Now, instead of writing each quantity as a sum over v £ R', we would like to write each as 
a sum over w £ I. We can do this as follows. 

We may write the sum E„ 6 «' D(N(v) n /)] as £„ €fl , \^2 w€N{v)nI rf(w)j by the defini- 
tion of D. Now, each vertex w £ I is counted in the inside sum d R i(w) times since w 
is in the neighborhood of d R i(w) different vertices of R' . Thus, YlveR' D(N(v) I) = 
Y: w€l d RI (w)d(w). Similarly, J^, eR , D R .(N(v) D /) = E„ei d *'( w ) 2 - 

Applying the inequality (contr 4.7) we have assumed for contradiction, we get: 

Y,dR>(w) 2 < \(l-36)^d R ,(w)d(w) 

< \(l- 36)^2 d R ,{w)(l + 6) jo+1 (since d(w) < (l + 6) jo+l for all w £ I) 

w£l 

= \(l-38)(l + 6) jo+1 D R ,(I). (by definition of D R .) (4.8) 

For any collection of values, the average of the squares is at least the square of the 
average. Thus: 






— J2d RI ( W ) 



D R ,(iy 



U)£l 



m 2 



'it is always dangerous to display false equations, so we are labeling these inequalities with the symbol 
"contr" to emphasize that they are just being assumed for contradiction. 
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So, D W (I) 2 I\I\ < 52 we id R ,(w) 2 . Combining this fact with equation (4.8), we have: 

^-D R ,{If < \(l-36)(l + 6y°+ l D R ,(I). (4.9) 

Multiplying both sides of equation (4.9) by \I\/D R i(I), we get: 

D R .{I) < A(l-3<5)(l + <5y° +1 |/| 

< A(l - 3*)(1 + S)D(I) (since d(w) > (1 + S) j ° for all w € /) 

< X(l-26)D(I). 

This contradicts equation (4.5) and completes the proof of Theorem 4.5. ■ 

Proof of Theorem 4.6: We are given a set S such that D R (S) > X'D(S); that is, 
at least a fraction of A' of the edges leaving the set S (double-counting edges with both 
endpoints in S) enter into R. We want to show that at least one of the sets Ni(S) both is 
large and has nearly a fraction A' of its vertices in R. To do so, we apply Lemma 4.8 where 
we have one "box" for each set N { (S). We place a ball in box i for each endpoint in iV,-(S) 
of an edge from S to iV,(5"). A ball is red if the endpoint to which it corresponds is in R. 
The number of balls in box i is D Ni ( S )(S) of which D N ^ S )nii(S) are red, and the number 
of balls total in the log 1+i n boxes is D(S) of which D R (S) are red. By Lemma 4.8, taking 
e = 8, for some i (0 < i < logi+« n), 

1. D N , B(s) n R (S) > SD R (S)/ (\og 1+t n) and (4.10) 

2. D Nto{s)nR (S)/D Nio(s) (S) > (l-S)X'. (4.11) 

By definition of N io (S), each vertex in N io (S) is incident to at least (1 + 8) l ° and less 
than (1 -(- £) io+1 edges from S. Thus, 

D Nlo{s)nR (S) < \N io (S)nR\(l + 6y° +1 

and 

D Nto(s) (S) > \N i0 {S)\{l + 6y° 

which implies that: 

\N io (S)nR\/\N io (S)\ > [D N , o(s) n R (S)/D N , o{s) (S)}/(l + S) 

> (1-S)X'/(1 + S) 

> (1-26)X'. (4.12) 

Equations (4.10) and (4.12) show that the index i satisfies both claims of the theorem. ■ 
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4.4 Applying the vertex-cover approximation 

Given a graph H on N vertices, M edges, and with a minimum vertex cover of size N V c, 
the BE/ MS vertex-cover algorithm [4] [28] discussed earlier (and also presented as algorithm 
Approx-IS in Appendix A) finds a vertex cover of size at most (2 - '"^"^ J N vc in time 
O(NM). 

If H has an independent set with at least |(1 - ^hj)N vertices, it must have a vertex 
cover of at most 1(1 + ^-^)^ vertices. So, the algorithm will find a vertex cover W C V(H) 
of size at most: 



i(l j l_\ (o _ log log ^ /V _ [l _ '°S'°K^ J 1 log log iV 
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Since W is a vertex cover, V{H) — W is an independent set of size at least ^(]dr^)- So, 
we have the following lemma. 

Lemma 4.9 Given a graph H on N vertices with an independent set of size at least |(1 — 
7-^-jf)N, the BE/MS algorithm can be used to find in polynomial time an independent set of 
size Sl(N/ log N). 

We now prove the Main Theorem (4.3). 

Proof of Theorem 4.3: Step 1 of algorithm First-Approx ensures that no vertex 
has degree less than f(n) for f(n) = n 2 ! 5 log 8 ^ 5 n. Step 2 ensures that no two vertices 
share more than n/f(n) 2 neighbors. Applying these values to Corollary 4.7 of the previous 
section yields the result that of the O(nlog 4 n) subsets generated in Step 3 of Algorithm 
First-Approx, at least one set T — T Vii j has 0(/(ra) 4 /(nlog 7 n)) vertices of which at least a 
fraction |(1 - ^r^) are colored red under some legal 3-coloring of G. By Lemma 4.9, since 
(* ~ kT^) - (1 _ krirl)' ^ tep 4 °^ ^gorithrn First-Approx will find an independent set in T 
of size fi(/(n) 4 /(nlog 8 n)). We can thus make progress of Type 1 [Large-IS] on some T v%i j 
in Step 4 of Algorithm First-Approx so long as: 

/(r07(nlog 8 n) = fi(n//(n)). 

Equivalently, we make progress towards an 0(/(n))-coloring so long as f(n) 5 = fi(n 2 log n), 
or /(n) = Q(n 2 / 5 log 8 '' 5 n). Thus, we have proved the Main Theorem. ■ 



Chapter 5 

Worst-case bounds for 3-colorable graphs: 
improved algorithm 



In this chapter, we present a procedure that improves on the bounds achieved by Algorithm 
First-Approx given in Chapter 4. The essence of the new algorithm is an improved method 
for forcing expansion (see Section 4.1) and making progress from regions of high density in 
a 3-colorable graph. This improves performance and results in coloring rc-vertex 3-colorable 
graphs with only 0(n 3 ^ s ) colors. 

Algorithm First-Approx performs most poorly when the input graph consists of a collec- 
tion of high-density regions or "clumps," with a lower density of edges between clumps. In 
particular, it performs worst when the set S = N(v)ClIj has a large fraction of its neighbors 
hit by about ra° 2 edges from vertices in S. Here we present an additional tool for making 
progress from such dense regions and thus improve the coloring bound. 

5.1 A useful lemma 

We now present a strengthening of Corollary 4.2, described in Lemma 5.1 below, that allows 
us to force a 3-colorable graph G to behave in a certain "nice" way. In particular, for any 
vertex v of G, for any subset S we select of N(v) of size at least (n log 2 n)//(n) 2 , the lemma 
allows us without loss of generality to force S to contain fi(|5|) vertices of each of the two 
available colors (that is, the colors that v does not have), or else make progress towards 
an /(n)-coloring of G. This will be useful for forcing sets to expand "roughly evenly" into 
vertices of the available colors in the graph. As with Corollary 4.2, this lemma requires the 
graph to be 3-colorable. 

Let f(n) be some nearly-polynomial function. 

Lemma 5.1 Given a set S C V{G) of size fi((nlog 2 n)//(n) 2 ), we can either make progress 
towards an 0(f(n))- coloring of G or else guarantee that under every legal 3-coloring of G, 
set S contains less than (1 — 4 J n )\S\ vertices of any given color class. 

The idea of the proof is that if S consists of vertices nearly all of one color, say red, then 
its neighborhood should contain mostly blue and green vertices and have few red vertices. If 

32 
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this occurs, then N(S) will have alarge independent set of size max{|iV(5')ngreen|, 1^(5)0 
blue]}. One can thus make progress on N(S) using the BE/MS Vertex-Cover algorithm. The 
difficulty with this approach is that the neighborhood N(S) need not have few red vertices. 
It could be, for example, that the red vertices in S tend to have a smaller degree than the 
others. Or, even if all vertices have the same degree, it could be that edges from the blue 
and green vertices of 5" all enter into different vertices in N(S), but edges from red vertices 
in S tend to hit many vertices multiple times. To handle these difficulties, we will run a 
procedure separating vertices and neighborhoods into bins depending on degree, in a similar 
manner to that done in the proofs of Theorems 4.5 and 4.6. 

Proof of Lemma 5.1: 

For convenience, let red be the color with the most vertices in S. The first goal is to find 
a large independent set S' C S. We can do this in a greedy fashion by deleting arbitrary 
edges from S. That is, begin with S' = S, and while S' is not an independent set, pick 
an arbitrary edge (a, b) between two vertices of S' and delete the endpoints from S' (let 
S' <- 5' - {a, b}). If we ever have deleted more than ^L_ edges from S, this means we 
must have removed over J 5 ' vertices not in red from S (an edge can have at most one 
endpoint in red). So, we can guarantee that no color comprises more than (1 — 41o 1 ) of the 
vertices of S and halt. Otherwise (we do not delete more than 4 \^ ' n edges from S), we will 
end with S' an independent set of size at least (1 - 21o 1 n )\S\, which is fi((nlog n)//(n) 2 ). 

Since S' is independent and has size ft((rclog 2 n)//(n) 2 ), we can make progress Type 2 
[Small-Nbhd] towards an 0(/(ra))-coloring of G if j JV(5")j < (nlog 2 n)/f(n), in which case 
we halt with "progress made". Otherwise, let T = N(S'), so \T\ > (nlog n)/f(n). 

The basic idea of the procedure now is the following. We first "throw out" edges so 
that the vertices in S' have disjoint neighborhoods in T. If at this point all vertices in S' 
had the same degree, we would be done: if set S' consisted almost entirely of red vertices, 
then set T would consist almost entirely of blue and green vertices. Since the vertices of S' 
may have differing degrees, we partition S' into bins based on degree in a similar fashion 
as done with the sets Ij defined in Section 3.2. For each bin, either it contains a good 
fraction of non-red vertices, or else its neighborhood is mostly blue and green. Thus, if a bin 
has many neighbors in T, we can either make progress using the BE/ MS algorithm on the 
neighborhood or else have a guaranteed number of non-red vertices in S' (recall, our final 
goal is to guarantee that S has at least 41o x \S\ non-red vertices.) Formally, we perform 
the following steps. 

1. For each vertex w in T, arbitrarily mark one of the edges from w into S'. Let E' be 
the set of marked edges. Now, for each v G S', define its marked neighborhood N'(v) 
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blue and green 



Figure 5.1: Vertices in S' have disjoint marked neighborhoods. If the vertices had 
nearly identical "marked degree," then a mostly red set S" would imply a mostly blue 
and green set T. 

by: 

jv'(t;) = {weT\(v,w)e E'}. 

For any set A C S', define the marked neighborhood of A similarly to be: 

N'(A) = (J N'(v). 
veA 

Note that by definition of E' ', if A and B are disjoint subsets of S', then their marked 
neighborhoods are disjoint as well, because each w € T is in the marked neighborhood 
of only one vertex of S' . (See Figure 5.1.) 

2. Partition S' into subsets such that in each subset, if we consider only the edges in 
£", the minimum degree is at least half of the maximum degree. In particular, we 
partition S' into sets 5 , . . . , S m for m < log n such that: 

Si = {veS' :\N'(v)\£[2\2 i+1 -l]}. 

(We may ignore vertices in S' with no marked neighbors.) 

Observation: Notice that if more than a fraction (1 — 21o 1 n ) of the vertices of some 
Si are red, then at most r^-^ of the vertices in N'(Si) can be red, since the non-red 
vertices in 5", can have at most twice as large a marked neighborhood in T as the red 
vertices do (and, as noted in Step 1, marked neighborhoods of disjoint subsets of S 1 
are disjoint). 
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3. Now, pick z'o such that |Ar'(5 io )| is maximized; so |iV'(5,- )| > ( 1+ i ogn )\T\ since there 
are at most (1 + log7i) sets Si and their neighborhoods are disjoint. Note that i is 
not necessarily the largest index, since lower index sets might have enough vertices to 
compensate for having fewer neighbors per vertex. 

4. We now apply the BE/MS vertex-cover algorithm (or equivalently, the independent 
set approximation algorithm Approx-IS given in Appendix A) to the set N'(Si ). If 
it finds an independent set of size Sl(n/f(n)), then we have made progress Type 1 
[Large-IS] and can halt with "progress made". 

The reason we apply the BE/ MS vertex cover algorithm is that if more than a fraction 
(1 - 2[o 1 n ) of the vertices of S io are red, then by the observation in Step 2, N'(S io ) has 
at most a -^-^ fraction of its vertices red, so N'(S io ) has an independent set of at least 
|(1 - y^) of its vertices, namely either iV'(S,- )nblue or N'(S io )n green, whichever is 
larger. Thus, by Lemma 4.9, we find an independent set of size fi(|JV'(5,- )|/logn) = 
fi(n//(n)) since we have assumed \T\ > (nlog 2 n)/f(n) and |iV'(5i )| > 1+1 o g „ \T\. 

So, if we do not make progress, we know it is not true that more than (1 - 2 * n ) of 
the vertices of Si are red. 

5. If we did not make progress in step 4, we know that at least 2 |o * n of the vertices in 
S io are blue or green. Now, let S' <- S' - S io and let T = N(S'). 

If S' has not been reduced to less than 1/3 its original size, then go back to Step 1. 
Notice that in this case, we may still assume that \T\ > (nlog 2 n)/f(n) since S' still 
has size S!((7ilog n)//(n) 2 ). 

If S' is less than 1/3 its original size, then go on to Step 6. 

6. If we reach this step, it means we have reduced S' to less than a third of its original 
size, and have done so by removing from S' sets containing at least a 2 * n fraction of 
blue and green vertices. Since S' originally had size at least (1 - 2|o 1 „ )1'?|, this implies 
we must have removed more than: 



1 



3 2 log n 



'-^l' 51 



41ogn 



blue and green vertices from S. So, we may halt with the guarantee asked for in 
the statement of the lemma since set S could not possibly have contained more than 
(1 — 41o 1 n )\S\ red vertices. ■ 
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5.2 Making progress from dense regions 

We will now use Lemma 5.1 to help take advantage of certain types of dense regions in 
3-colorable graphs. In particular, we consider the case of two sets of vertices S and T where 
S is 2-colored under some legal 3-coloring of G and the number of edges between S and 
T is large compared with the sizes of the two sets. This occurs when 5 is a subset of the 
neighborhood of a vertex (e.g., a set N(v) n /,) and T is some set Ni(S) for a large i (see 
Section 3.2). 

Theorem 5.2 Given sets of vertices S and T in an n-vertex 3-colorable graph G, such that 

1. S is 2-colored under some legal 3-coloring of G, 

2. D T (S) = fi(|5|(nlog 2 n)/f(n) 2 ), and 

3. [D T (S)] 3 = n(\\S\ + m<ixd T (v)] X [|S||T| 2 (nlogn)//(n) 2 + |T||5| V//(n) 4 ]) , 

then we can make progress towards an 0(f(n))- coloring of G. 

Before proving this theorem, let us first make sense of the condition on [D T (S)] 3 by 
considering a few examples. Suppose we wish to color with f(n) = n 3/,s colors, the set S 
has size n 3/8 , and each vertex v in S has degree n 3/8 into T. Then, °T^p = n 3/8 , which is 
greater than n 1 ^ 4 log 2 n (condition 2). The main condition (condition 3) reduces to: 



n 18/8 > cn 3/8 



|T|V/ 8 logn+|r|n 10 / 8 ]. 



Ignoring logarithmic factors, the theorem assures us we make progress if \T\ = 0(n 5 ' s ). This 
is the basic idea for the 0(n 3 / 8 log 5/ ' 2 n)-coloring algorithm described later. For that appli- 
cation of this theorem, if T has fi(n 5 / 8 ) vertices, we will be able to find a large independent 
set inside T, and thus make progress of Type 1. 

As another example, if we wished to color with n° 35 colors, S had size n° 35 and each 
vertex in S had degree n° 35 into T, then the main condition reduces to 

n 3 " 1 > en " 35 [|T| 2 n - 65 log n+|T|n x ' 3 ]. 

In this case, we only make progress if \T\ = 0(n 045 ) (here the IT^ra 1 - 3 term is dominant). 
However, we do not know how to make use of forcing \T\ = (l(n 0A5 ). 

Proof of Theorem 5.2: For convenience, let blue and green be the two colors that 
appear in S, and let us define the following notation. 

• Let D total = D T (S). 

• Let d avg = D tota.\/ \S\ be the average degree into T of vertices in S. 
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We want to keep track of those vertices of T that have a reasonably large degree into S, so 
we define a subset T" of T by: 

• V = {w G T | d s (w) > |^f }• 
Since D S (T - V) < \T\ [§^$f ] , we have D S (T) > \D toiah or equivalently, 

Dt'(S) > Aotal/2. (5.1) 

We also want to look at those vertices in S that have reasonably large degree into T", so 
define: 

. S ' = {veS\d T ,(v)>l^p}. 

Since D T ,{S - S') < \S\ [l £j ^f } ], we have: D T ,(S') > \D T ,{S), which by equation 5.1 
implies: 

D T .{S') > Aotai/4. (5.2) 

Also, by definition of S' and equation (5.1), if v € S' then d T ,(v) > ^^ffi 1 or equivalently, 

d T <{v) > \d avg for all v G 5'. (5.3) 

Since we are given (condition 2) that <f avg = fi((nlog 2 n)//(n) 2 ), this implies that all v G S' 
have d T (v) > d T -(u) = il((n log 2 n)/ f(n) 2 ). Thus, by Lemma 5.1 (applied to the sets 
N(v) n T), we can guarantee that each vertex v G S' has at least a fraction 4|o x of its 
edges into T entering into non-red vertices. 

So, for some non-red color, say green without loss of generality, at least D T (S') /(81ogn) 
edges from S' enter into green vertices of T. This implies that some green vertex g G T has 
degree at least D T (S')/(8\T\\ogn) into S'. Now, define (see Figure 5.2): 



• X = N(g)nS'. 

• Y = N(X)nT. 



So, we have: 

\X\ > iZMS')/(|T|logn) 
> £2W(|T|logn) 

- n((ft)(fe))- ( 5 - 4 ) 

Note that set X consists entirely of blue vertices, and since Y is in the neighborhood of a 
blue set, Y contains only red and green vertices. We want to show that Y is large, because 
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blue green 

Figure 5.2: Vertex g and the sets X and Y . Also, green vertex g' £ S (defined later) and 
the intersecting neighborhoods. 

we will later intersect Y with a red and blue set to get a large monochromatic (red) set, 
which will allow us to make progress. We show that Y must be large as follows. 

By Theorem 4.1 we may assume that no two vertices of X share more than n/f(n) 2 
neighbors in T". Now suppose that \X\ < ^-(|d avg )- In this case, each vertex v G X 
can share at most \X\(n/ f(n) 2 ) < |d avg neighbors with all of the other vertices in X. This 
implies, by equation (5.3), that v must have at least |d avg neighbors in T" not shared with 
any other vertices of X . So, set Y must have size at least ft(|X|Qf avg ). 

If \X\ > ^(Idavg), then if we only consider the first ^(^ avg ) of the vertices of X, 
we still get that \Y\ = ft(^^-(d avg ) 2 ). So, whichever case occurs, we have: 

\Y\ = ft(min{|X|d avg , l±f(d avg ) 2 }). (5.5) 

By definition, Y is a subset of T' and vertices of T' all have a high degree into S. So, we 
can lower bound the degree of Y into S by: 

D S (Y) > (i^r)m 

- i^d \Y\ 

— 2 |T| av Kl I 

= Q (min {|X|{f}(4v g ) 2 , ^Kv g ) 3 {f}}) (by equation 5.5) 

= ft (min{[{f[]V avg ) 3 /logn, ^(^jf}}) • (by equation 5.4) (5.6) 

Now we apply condition 3 in the statement of the theorem. The condition (dividing both 
sides by |5| 3 ) states that (d avg ) 3 = [|5| + max„ £S d T (v)] -ft (jf^^logn + {fjjfgr) • So, 
this implies both that: 



l£i 2 



(4v g ) 3 /logn = [\S\ + m*xdT(vJ\-n{j£y) ( 5 - 7 ) 
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and 

^Kv^OH] - [\S\ + m*xMv)]-n{jfa). (5-8) 

Thus, combining both equations (5.7) and (5.8) with equation (5.6), we get: 

D S (Y) = SlfjfolW + mgMv)])- ( 5 - 9 ) 

It now must be that one of the following two cases occurs. The first case is that there 
is some green vertex g' G S in the neighborhood of more than ^D S (Y)/\S\ vertices of Y. 
In this case, according to equation (5.9), it must be that Dyiy(Y) = fi(rc//(n) 2 ). So, 
N(g') H Y is a set of 0(ra//(rc) 2 ) vertices, all of which are red since N(g') C blue U red and 
Y C red U green; see Figure 5.2. Thus, we can make progress on this monochromatic set 
using Corollary 4.2. 

The other possibility is that no green vertex in S is in the neighborhood of more than 
%D S (Y)/\S\ vertices of Y. In this case, the set of all vertices in S hit by more than 
^D S (Y)/\S\ edges from Y is all blue. Define Z to be that set; that is: 

. Z = {veS\d Y (v)>±D s (Y)/\S\}. 

Clearly, the number of edges between vertices of Y and vertices in (5 — Z) is at most 
\S\(±D S (Y)/\S\) = $Ds(Y). So, D Z (Y) > \D S {Y). Thus, we can bound the size of Z by: 

\Z\ > \D s {Y)lmax.d Y {v) 



> \D s {Y)lm^d T {v) 



ves 



which by equation (5.9) implies: 



\Z\ = fi(n//(n) 2 ). 



Since Z is monochromatic (blue) we can now use Corollary 4.2 to make progress. So, 
whichever of the two cases occurs, we have made progress towards an 0(/(n))-coloring. 
The final algorithm for making progress given our sets S and T is as follows: 

Algorithm Dense-Region-Progress: 

Given: Sets S and T satisfying the conditions of Theorem 5.2 in some graph G. 
Output: Progress towards an 0(f(n))- coloring ofG. 

1. Run the algorithm of Lemma 5.1 on N(v) f~l T for all v £ S. If any runs make 
progress towards an 0(f(n))-coloring, then halt. Otherwise, we know there are 
many edges from S into red, blue, and green vertices of T under any legal 3- 
coloring of G. 
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2. If for some pair of vertices u,v G S, we have \N(u) n N(v)\ > n/f(nf, then use 
Theorem 4.1 to make progress. 

3. Otherwise, for each vertex v G T, 

(a) let Y = N(N(v) f) S) n T and let Z = {w G S : d Y (w) > n/f(n) 2 }. 

(Note that , we do not need to use the sets S' and T' ; they were just conve- 
nient for the analysis.) 

(b) Run the algorithm of Corollary 4.2 on Z. 

(c) For each w G Z, run the algorithm of Corollary 4.2 on Y D N(w). 

The above proof guarantees that this algorithm makes progress. ■ 

5.3 The coloring algorithm 

We now combine algorithms First- Approx and Dense- Region- Progress to get an improved 
algorithm guaranteed to 0(n 3/8 )-color any n-vertex 3-colorable graph. 

Algorithm Improved-Approx: 

Given: G = (V, E), a 3-colorable graph on n vertices. Let f(n) = n 3/8 (logn) 5/ ' 2 . 
Output: Progress towards an O(f(n))-coloring ofG. 

1. For each vertex v, if d(v) < f(n), make progress Type 2 [Small-Nbhd]. 

2. Otherwise, for each vertex v, for each i,j G {0, 1, . . .,5(logrc) 2 }: 

(a) Let 5 = JV(v)n J,-. 

(b) Let T = Ni(S). 

(c) If \T\ > n 5/8 /(logn) 3/2 , run the BE/MS Vertex-Cover approximation algo- 
rithm. If we find an independent set of size at least n/f(n), we have made 
progress Type 1 [Large-IS]. 

(d) If S and T satisfy the conditions of Theorem 5.2, then make progress using 
Algorithm Dense- Region- Progress . 

Theorem 5.3 Algorithm Improved-Approx will make progress towards an 0(rc 3/,8 (logn) 5 / 2 )- 
coloring of any n-vertex 3-colorable graph. 

Proof: Assume Algorithm Improved-Approx does not make progress in Step 1. So, we know 
that the minimum degree d > f(n) = n 3/8 (logn) 5y ' 2 . As in Chapter 4, let R — red be the 
color class with .D(red) = max(_D(red), U(blue), Z^green)). 
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We now apply some of the facts proven in Section 4.3.2. Theorem 4.5 guarantees us 
that for some vertex v G R and some index j, the set S = N{v) n I, in Step 2(a) has the 
property that: 

\S\ > 6 2 f(n)/log 1+s n, and (5.10) 

D R (S) > ±(1-36)D(S), (5.11) 

where 6 = j^-^. Note that for the given value of /, equation (5.10) and the definition of 6 
imply that: 

151 = ft(n 3 /7(logn) 3 / 2 ). ( 5 - 12 ) 

Theorem 4.6 (using A' = |(1 - 36)) shows that for some index i, the set T = Ni(S) of step 
2(b) has the property that: 

D TnR (S) > 8D R (S)/\og 1+s n, and (5.13) 

\TriR\/\T\ > i(l-2«)(l-3«). (5.14) 

Let us now, for the rest of the proof, fix two such sets S and T satisfying equations (5.10) 
through (5.14). We now show that these equations and the definitions of S and T will 
ensure success of the algorithm. 

Suppose first that \T\ > n 5/8 /(logn) 3/ ' 2 . By equation 5.14 above, set T contains an 
independent set (T n R) of at least a fraction ^(1 - j^-) of its vertices (using 6 = ^^)- 
So by Lemma 4.9, the BE/ MS vertex-cover algorithm finds an independent set of size 
fi (n 5/8 /(logn) 5/2 ) = il(n/f(n)) so we make progress Type 1 [Large-IS] in Step 2(c). 

On the other hand, if \T\ < n 5f8 /(\ogn) 3/2 , then we just need to show that S and T 
satisfy the conditions of Theorem 5.2. Clearly, S is 2-colored under any legal 3-coloring 
of G since S C N(v), so Condition 1 is satisfied. For f(n) = n 3/8 (logn) 5/,2 7 Condition 2 
reduces to D T (S)/\S\ = fi (n 1/4 /(logn) 3 ), which is found to be easily met using equations 
(5.11) and (5.13) as follows. 

D T (S) > D TnR (S) = il(D(S)/(lognf) (5.15) 

= n(d\S\/(\ognf). (5.16) 

So, 

D T (S)/\S\ > ft^/Oogn) 1 / 2 ) (5.17) 

= tyn^/Oogn) 3 ). (5.18) 

The last task is to show that Condition 3 is satisfied, which for the given value of /, 
reduces to the requirement that 
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ID T (S)? = n([l*l+n|«*W] -[1*1 m'p^j + ^11*1"^,]). (*■!») 

To show that this requirement holds, we upper bound the quantities \S\, \T\, and 
xm,x v€S d T (v). 

From equation (5.17), we have 

|5| = 0((\ogny/ 2 D T (S)/n 3 < 8 ). (5.20) 

Next, our very condition for this case was that: 

\T\ = 0(n 5 / 8 /(logn) 3/2 ). (5.21) 

Finally, since 5 C /, so all vertices of S have nearly the same degree (though not necessarily 
the same degree into T), we can bound max^gs ^t( u ) as follows: 

maxd T (u) = 0(D(S)/\S\) 

= 0(D T (S)(\ognf/\S\) (using equation 5.15) 

= 0(D T (5)(logra) 3 (logn) 3/2 /n 3/8 ) (using equation 5.12) 

= o(D T (S)(\ognf'yn 3 ' 8 ). (5.22) 

The three equations (5.20), (5.21), and (5.22) allow us to reduce requirement (5.19) to the 
condition that: 



'(logn) 13 / 2 JV ' (logn) 2 



Equivalently, we just have the requirement that D T (S) = fi(n 3/4 /(logn) 2 + D T (S)/(logn) 6 ). 
Clearly, D T (S) = n(D T (S)/(logn) 6 ) so we simply need D T (S) = ft(n 3 / 4 /(logn) 2 ) . We 
are now done, because combining equations (5.17) and (5.12) yields: 

D T (S) = n(|S|n 3 / 8 /0ogn) 1/2 ) 
= (l(n 3 / 4 /(logn) 2 ). 

Thus, Step 2(d) of Algorithm I m proved- Approx makes progress. ■ 



Chapter 6 

Worst-case bounds for A>colorable graphs 



We now consider two different methods for using the preceding techniques developed for 
3-colorable graphs to improve the bounds for approximately coloring ^-colorable graphs for 
fixed k > 3. One method is simply to use the preceding algorithms as an improved base case 
for a recursive strategy used by Wigderson [43]. A second method is to directly extend the 
above algorithms for k > 3. For the latter approach, one needs both an analog of the shared 
neighborhood condition (Theorem 4.1), and a way to cascade together several applications 
of the distance-2 neighbor-taking process (Step 3 of Algorithm First-Approx) so that we can 
"pump up" the relative size of the largest independent set. We will see that the second 
method yields better asymptotic bounds than the first, though with diminishing returns 
as k increases. However, the running time of the second method grows as (nlog n) 2k+ °^ 
while the running time of the first is dominated just by the time taken by the base-case 
algorithm. The two methods can be combined, providing a time/performance tradeoff, by 
choosing some k and using the second method as a base case for the first method for k > k Q . 
This will result in an algorithm with running time O((nlog 2 n) 2ko+c ) for some constant c. 

The results of these approaches are summarized (in "0" notation) in Table 6.1. The 
first row shows the bound for using Wigderson 's algorithm with base case at k — 2. The 
second and third rows show how the bounds are improved when we use the new coloring 
method as base cases for k = 3 and k = 4 respectively. The last row shows the best bounds 
we can get using the direct extension. Note: the bounds in the last two rows are with 
high probability over the coin tosses of the algorithm. See Corollaries 6.2 and 6.7 for more 
precise bounds. 

6.1 A simple recursive approach 

A standard method [43] [6] [22] to approximately color ^-colorable graphs is to pick a vertex 
of high degree and recursively try to color its (k — l)-colorable set of neighbors with as few 
colors as possible. When we get to a 2-colorable set, we can just directly 2-color that set 
in the standard way. For example, Wigderson's algorithm for coloring fc-colorable graphs 
with fcn 1_1 /(* _1 ) colors can be described as follows: 

43 
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Where 


k = 3 


4 


5 


6 


7 


general 


Wigderson [43] 


n x ' 2 


n 2/3 


n 3 / 4 


n 4 / 5 


n 5/6 


n 1_ ^r 




n o.s 


n 0.667 


n 0.75 


n 0.8 


0.833 






n 3/8 


„8/13 


„13/18 


n 18/23 


n 23/28 


1 i 


base: k — 3 


n k ~ 7 ' b 




^0.375 


n 0.615 


n 0.722 


n 0.783 


n 0.821 









n 3/5 


ra 5 ' 7 


n 7 ' 9 


„9/ll 


i i 


base: k = 4 


n k-3/3 







n 0.6 


n 0.714 


n 0.778 


n 0.818 




best we have 


n 3/8 


n 3/5 


91 
71131 


105 


5301 
7J 6581 






„0.375 


n 0.6 


n 0.69S 


„0.766 


n 0.806 





Table 6.1: Summary of results in "0" notation for various combinations of algorithms. 
Items "base: k — 3" and "base: ifc = 4" correspond to using Algorithm Recursive-Color 
with Algorithm Multi-Stage-Color as a base case for k — 3 or 4 respectively. 



Wigderson's Algorithm for A;-colorable graphs: 
Given: A k-colorable graph G on n vertices. 
Output: A coloring with at most kn 1 ~ 1 ^ k ~ 1 ^ colors. 

1. If there exists a vertex v with at least n 1-1 ^* -1 ^ neighbors, then color the 
neighborhood recursively with (k - 1) (n 1_1/( * _1) ) k ' 2 = (k - 1) (n^ 1 ) = 
(k - 1)71*=! colors. Then remove those nodes from the graph and the colors from 
the palette. 

Note that this step can be executed at most n 1 /^ -1 ) times, resulting in a total 
of{k — l)n' 3 i +irrT = {k — l)n 1-1 /( t-1 ) colors used in this step. 

2. Otherwise, greedily color the graph left with n 1-1 ^* -1 ^ colors. 
So, the total number of colors used in both steps together is 

kn l - l '^- l \ 

(Note that for the base case of k = 2, we have 2 = 2n 1 ~ 1 ^ 2 ~ 1 \) 

The algorithms presented in the previous chapters allow one to stop at k = 3 as a base 
case instead of k — 2 in this type of procedure and thus use fewer colors. More generally, 
we can describe when a bound achieved for coloring graphs of chromatic number k will 
improve the performance of this kind of recursive procedure for graphs of higher chromatic 
number. In particular, suppose we have an algorithm A to color any n-vertex A: -colorable 
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graph with d{n a ) colors. Then, the important quantity for this approach, which we call 
the recursive performance r(A) of the algorithm, is: 

r(A) = k --^—. (6.1) 

1 — a 

If an algorithm has a higher value of r, then the bounds achieved by using that as a base 
case for k > k will be improved. Specifically, the recursive algorithm will color A;-colorable 
graphs for k > k with 6 (n 1 - 1 /^-""^))) colors. So, for example, using the fact that we can 
2-color 2-colorable graphs (k = 2, a = 0), we find r = 1 and the bound is 6 (n 1_1 ^ i_1) ). 
Using the improved bounds for coloring 3-colorable graphs in chapter 5 (k = 3, a = 3/8), 
we get r = 3 — A^ — 7/5, so the improved bound for k > 3 is: 

6 (n 1 -^) colors. (6.2) 

Later, in Section 6.2, we will see how to color 4-colorable graphs with 0(n 3 / 5 ) colors, so we 
get r = 4 - jig = 3/2. Thus, for k > 4, we can color with 0(n 1 ~ k - 3 ' 2 ) colors. 

The following theorem more precisely describes the bounds achieved by the recursive 
approach. 

Theorem 6.1 Given an algorithm A to color any m-vertex k -colorable graph with cm a log m 
colors, then algorithm Recursive- Color(^4) below can color any n-vertex k-colorable graph 
(k > k ) with at most: 

C k (n) = [c + ik-ka^-VC'-^ilogn) 13 ^} (6.3) 

colors, where r = r(A) = k — j^. 

Using Theorem 6.1 and the bounds achieved by algorithm I m proved- Approx, (k — 3, a = 
3/8, /3 = 5/2),we can restate formula (6.2) more precisely in the following corollary. 

Corollary 6.2 Algorithm Recursive-Color(lmproved-Approx) colors any n-vertex k-colorable 
graph (k > 3) with at most 

O fn 1_ *^7 J (logn)^75j 

colors. 

The recursive algorithm to achieve these bounds is described below. 

Algorithm Recursive-Color: (Variant on Wigderson's algorithm) 

Given: An n-vertex k-colorable graph G and an algorithm A to color any m-vertex 
k -colorable graph with at most C ko (m) = cm" log^ m colors (k < k). 

Output: A C k {n)- coloring ofG, for C k (n) as defined in equation (6.3). 
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1. Let r = k — -^—. 

u 1 — Of 

2. Let f(n,k) = ^-^-^(logn)"^ . 

3. While there exists a vertex with at least f(n,k) neighbors, select f(n,k) of its 
neighbors and color them with Ck-i(f(n,k)) colors. Remove those nodes from 
the graph and the colors from the palette. 

Note that we can execute this step at most n/f(n, k) times. 

4. Otherwise, greedily color the graph with f(n, k) colors. 

Proof of Theorem 6.1: Let A be an algorithm that colors any m- vertex fc -colorable 
graph with cm a log 13 m colors and let r = r(A). We will use C k {n) to denote the coloring 
bound achieved on n- vertex fc-colorable graphs. First, formula (6.3) in the statement of the 
theorem holds for the base case of k = k since for k — k , we have: 

C ko (n) = en 1 ' mh^ ( log n )/M 
= era" log n. 



Let c k = c+(k — k ) and let f(n,k) = n k ~ r (logn) /3 "^ rr as in Algorithm Recursive-Color. 
So, assuming the bounds of Theorem 6.1 inductively for k' < k, we need to show that 
C k (n) < c k f(n,k). 

Since we can loop in step 3 of Algorithm Recursive-Color at most n/f(n,k) times, this 
results in the recurrence: 

C k (n) < C t _! (/(n, *)) [n/f(n, k)] + /(n, k). 

So, substituting in the bounds of Theorem 6.1 inductively, we have: 



l/(».*)J 



J L/(".*)J 

+ f(n,k) 



C k (n) < [c t _i[/(n,A:)] 1 - 1 /(*-- 1 )[log/(n,fc)f(^^)' 

< c i _ 1 [/(n,A;)] 1 - 1 ^-'- 1 )[lognf(^^) 

= c k . in [f(n, k)]-^ k - r -^\[og n ]"(^^r) + f(n, k) 

= c^tT^)™ ([logn]^) 1 ^ [lognfi^) + f(n,k) 

= c A _ 1 n 1 -^[lognf(^^)(^ +1 ) + /(n,fc) 

= c^n 1 -^ [log rcf(^) + /(n, A:) 

= ct-if{n,k) + f(n,k) 

= c k f(n,k). m 
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6.2 Directly extending the k = 3 algorithm 

6.2.1 Motivation 

In this section, we describe how the methods of Algorithm First-Approx of Chapter 4 can be 
applied directly to graphs of higher chromatic number, yielding improved coloring bounds 
for such graphs. Unfortunately, we do not know a way to extend the approach of Algorithm 
I m proved- Approx in a similar way, though it can still provide a useful "base case". 

The main idea of Algorithm First-Approx was to look at large subsets of the distance- 
2 neighbors of vertices in a 3-colorable graph: in particular, the sets Ni(N(v) n Ij) for 
each vertex r; and each pair of indices i,j. The "well-distributed" property proved in 
Theorems 4.5 and 4.6 ensures that one such set will be nearly half red under some legal 
3-coloring of the graph, and the expansion property of Theorem 4.1 ensures the set is large 
as well. 

While the expansion property depended heavily on the graph being 3-colorable, the 
theorems forcing good distribution require only that the given graph have an independent 
set of large total degree (see Section 4.3.2). In particular, they simply require that there 
exist a large independent set R such that D R (V — R) > XD{V — R) for some constant A 
and that the graph have sufficiently large minimum degree. So, we could conceivably make 
progress on graphs of a higher chromatic number than 3 by cascading several applications 
of the distance-2 neighbor-taking stage in the following way. 

Suppose, say, G is a 5-colorable graph and we wish to color G with f(n) colors. Then, 
we know there exists an independent set R such that D R {V — R) > \D(V — R) and we can 
establish a minimum degree of f(n). If we could guarantee that no two vertices shared too 
many neighbors, we could look at the sets T Vi , :>; - and be assured that one will be large and 
have an independent set R' = R(~\ T Vjij such that \R'\ « ^iJl.ijI using Theorems 4.5 and 
4.6. Let us now focus on the subgraph G' induced by T v>i j, and let V = T Vii j. Suppose 
we could in addition somehow ensure that within G", the vertices of R' had about the same 
average degree as the other vertices of V. Then we would have D(R') ~ |_D(V'), which 
would imply that: 

D R ,(y'-R') ss ±D(V'-R'), (6.4) 

since D R .(V - R') = D(R') and D(R') ss \D{V) = \{D{V - R') + D(R')), where we are 
now counting degrees only within G". 

Now, if we re-establish a minimum degree without destroying (6.4) above, we could 
then re-apply the distance-2 neighbor-taking process within G' to get a set V" containing an 
independent set R" such that \R"\ w ||F"|. If again we could ensure that D(R") w \D(V") 
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within the new graph G", we would get: 

D R ,,(V"-R") « \D{V"-R"). 

Thus, one final application of examining the sets T Vii j within G" will yield some set on 
which the BE/MS vertex-cover algorithm makes progress. 

So, the two main ingredients needed to make this procedure go through are (1) how to 
ensure that no two vertices share too many neighbors in common, and (2) how to get from 
\R'\ « A| V^'| to D(R') « \D{V). These problems are solved in the following sections. 

6.2.2 The bootstrapping algorithm 

We now describe procedures that allow us to "bootstrap" applications of Algorithm First- Approx 
to graphs of higher chromatic number. The resulting algorithm Multi-Stage-Color will color 
any n-vertex fc-colorable graph with: 

• /*(«) = O(n a <*>log / ' ( * ) n) colors, 

where a(k) will be defined inductively in &, and f3(k) is a nondecreasing function such 
that fi(k) < 5.5. The exponent f3 of the logarithm in fact approaches 5.5 as k — + oo. 
Because a is the critical value and the log factors are low-order terms, for purposes 
of simpler analysis we will not attempt to get tight bounds and assume /3 is fixed at 
5.5 for all k > 3. 

For base cases, q(2) = and a(3) = 3/8 using algorithm Improved-Approx. The recursive 
formula for a(k) for k > 3 is: 



l-a(fc) 2 k ~ 2 l-a(k-2)\ 2 k ~ 2 / 

We will examine this formula in more detail later, but we just note here that a is non- 
decreasing in k. 

We need in this section to redefine the value 6 to depend on the chromatic number k of 
the graph G we wish to color. In particular, we shall use: 

• 8 = 6(k) = —^ . 

v ; 4*logn 

The sets Ij and N((v) used in Chapter 4 now depend on this new quantity. 

As mentioned previously, the theorems of Section 4.3.2 forcing good distribution do not 

require that the graph be 3-colorable, only that there exist a large independent set R such 

that D R (V — R) > XD(V — R) for some constant A and that the graph have sufficiently 

large minimum degree. Let us, in fact, repeat Corollary 4.7 here, removing all mention of 

the chromatic number of the graph. (The fact that the graph was 3-colorable was used only 

in showing that A > 1/2.) 
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Corollary 6.3 (Variant of Corollary 4.7) Suppose G = (V, E) is an n-vertex graph 
such that (1) no two vertices share more than s neighbors, (2) G has minimum degree 
dm>n > max(s(l + 8), (3 log n)/6 2 ), and (3) G contains an independent set R such that 
Dr(V - R) > XD(V - R) for some constant X £ [0, 1]. Then, for any 6 = Q * , , for some 
v £V and some i,j G [0, . . . ,log 1+< n], the set 

T Viiij = N i (N(v)nl j ) 

has size at least il ((d min ) 2 /(slog n)) and the property that \T Vi ij fl R\ > A(l — 5^)|T U| ij|. 

We now present a new method to ensure that no two vertices share too many neighbors. 

Theorem 6.4 Given an n-vertex k-colorable graph G containing two vertices that share 

l-a(k) 

at least n i-»(*-a) neighbors and an algorithm A to color any m-vertex (k — 2)-colorable 
graph with fk-^i^n) colors, Algorithm Sharing-Progress below will make progress towards an 
f k (n)-coloring of G. 

Algorithm Sharing-Progress: 

Given: (1) An n-vertex k-colorable graph G containing two vertices that share at least 

l-a(lt) 

n i-»(i.-3) neighbors, and (2) an algorithm A to color any m-vertex (k — 2)-colorable 
graph with fk-^i 171 ) colors. 

Output: Progress towards an fk(n) coloring of G. 

l-a(k) 

1. Let S — N(x) fl N(y) where x and y share at least n 1 -"'*- 2 ) neighbors, and let 
Gs be the subgraph induced by set S . 

2. Run algorithm A on G s - Note that if G s is (k — 2)-colorable, then A will color 
Gs with at most: 

f k - 2 (\S\) = 0(\S\<* k -*\log\S\)K k -V) 

< 0(\S\ a(k - 2 \\ognY {k) ) colors, 

(using \S\ < n and (i non- decreasing). Thus, Algorithm A will find an indepen- 
dent set of size at least: 



"( (Wnw ) = "( (l^g „)/»(» ) (for the given choice of \S\) 

fi(n// t (n)). 



Thus, ifG s is (k — 2)-colorable, then we have made progress of Type 1 [Large-IS]. 
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3. If we did not make progress in Step 2, it must be that Gs was not (k — 2)- 
colorable. The only way this could be is ifx and y must be the same color under 
any legal k-coloring ofG. So, we can merge vertices x and y and make progress 
of Type 3 [Same-Color]. 

The argument given in Algorithm Sharing- Progress proves Theorem 6.4. ■ 

We now use Algorithm Sharing- Progress in a procedure that allows us to "bootstrap" 
applications of Step 3 of Algorithm First-Approx. 

Algorithm Bootstrap: 

Given: (1) Values a € [0, 1], (5 G Z and 6 = Q(] * , , and (2) An m-vertex subgraph 
H (m ;> 1/8 2 ) of an n-vertex graph G such that H contains an independent set R 
with \R\ > A|V(fr)| for some constant A > 0. 

Output: Either: (1) progress towards an 0(n a log n)-coloring ofG, or else (2) at 
most m/2 subgraphs G ,Gi, . . ., G m / 2 -i of H such that with high probability at least 
one Gi has both a minimum degree of {8 2 — )n a log' 3 n and considering only edges 
within d, D(R n V(Gi)) > (A - 26)D(V(Gi)). 

1. Let G = (Vo,-Eo) = H- Inductively create graph Gi — (Vi,Ej) from graph G,_i 
for i — 1,2,..., m/2 — 1 by selecting an edge at random in i?,_i and deleting 
both endpoints. So, |Vj-| = |Vj-_i — 2|. 

2. For each G,- with at least 6m vertices, while Gi contains a vertex with degree 
less than 6 2 mn a ~ 1 log n: delete from Gi the vertex of minimum degree and all 
incident edges. 

Suppose we have removed more than 6 2 m vertices from any Gi. Since within 
the set Wi of vertices deleted from G iy the degree of each vertex can be at most 
Pmn a ~ x log^ n, we can greedily find an independent set inside Wi of size at least: 

g — = n/(n log H n). 

8 2 mn a ~ l log p n 

So, we make progress Type 1 [Large-IS] towards an O(n a \og n)-coloring of G. 

3. If we did not make progress in Step 2, then output the graphs Gi for i — 
0,1,..., to/2-1. 

Theorem 6.5 [Algorithm Bootstrap works as guaranteed] Given an m-vertex subgraph 
H (to ^> l/S 2 ) of an n-vertex graph G such that H contains an independent set R with 
\R\ > X\V(H )| for some constant A. Then, either (1) Algorithm Bootstrap makes progress 
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towards an 0(n a log' 3 n)-coloring of G in Step 2, or else (2) with high probability, one of 
the subgraphs G t = (V;,£,) has both a minimum degree of 6 2 mn°- 1 log n and within the 
subgraph, D(R n VJ) > (A - 2S)D(Vi). 

Proof: Let us consider the graphs G, created after Step 1 of Algorithm Bootstrap, but 
before deleting vertices in Step 2. Let R t = V t n R and let JV = m(l - «)/2; note that set 
Vjv contains 6m vertices. We show now that with high probability, for some index i < JV, 
we have D(R t ) > (A - 6)D(V>). The idea of the argument is that since we are removing 
vertices with a probability proportional to their degree, if £>(#,) < (A - 6)D(Vi) for all 
such i, then we would remove many fewer vertices from R than from V — R. In fact, with 
high probability we would remove so many fewer that once we reach graph G N , the set R N 
would be larger than than V N , a clear contradiction. 

For each i < JV, let A, be the event that in creating G i+ i from G,, we delete an edge 
with an endpoint in R { . Since the number of edges in E t with an endpoint in Ri is exactly 
D(R{) (because Ri is an independent set), we have: 

Pr[Ai] - D{Ri)/\Ei\ 

= 2D(R i )/D(V i ). (6.6) 

Suppose for some index i < N we have D(R { ) < (A - 6)D(Vi). Then, the probability event 
Ai occurs is at most 2(A — 6). 

Let p = 2(A - 6) and assume for contradiction that £>(#,) < (A - 6)D(Vi) for every 
i < N. So, for each i < N, the probability that the ith edge removed from G has an 
endpoint in R is less than p. Since we remove JV edges to create G N and every time we 
remove an edge the probability it has an endpoint in R is less than p, by Chernoff bounds [2] 
the probability we remove more than pN(l + 6) vertices from R is at most e~ s n ^ N \ Since 
pN = fi(m) and we assume m > 1/6 2 in the statement of the theorem, the probability we 
remove more than pJV(l + 6) vertices from R is o(l). Thus, with high probability: 

\R N \ > \m-pN(l + 6) 

= Xm-2(X-6)[m{l-6)/2](l + S) 

= m[X - (A - <5)(1 - 6 2 )} 

= 6m + m6 2 (X- 6) 

> 6m. (since A > 6) 

So, with high probability, \R N \ > \V N \, a contradiction. Thus, with high probability our 
assumption that D(R { ) < (A - 6)D(V t ) for every i < JV is incorrect; that is, for some V { of 
size at least 6m, we have D(Ri) > (A - 6)D(Vi). 
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Now, let * be such that |K| > 6m and £>(#,) > (A - 6)D(Vi) before Step 2 of Algorithm 
Bootstrap. In Step 2, if at most 8 2 m vertices are removed, then we remove at most a fraction 
8 of the vertices of V; in order to establish the desired minimum degree. Since we are always 
removing the vertex of least degree, we remove at most 8D(Vi) from the total degree sum 
of the subgraph. Even if, at worst, all the vertices removed were from the set R t , we still 
have in the graph remaining that: 

D(R i )>(X-28)D(V i ), 

as claimed. ■ 

Given Theorem 6.5, we have an improved approximation algorithm for coloring graphs 
of chromatic number k > 3 as follows. We first apply algorithm Sharing- Progress; we 
then run the distance-2 neighbor-taking stage of Algorithm First- Approx k - 2 times, using 
Algorithm Bootstrap to "clean up" the graph in between applications; and finally, we use 
the BE/MS vertex-cover algorithm. The formal algorithm to color any A;-colorable graph 
with O(n a(fc) log /3( * :) n) colors is given below. For simplicity, we have separated out the 
distance-2-neighbor/bootstrap step into a separate procedure. 

Algorithm Multi-Stage-Color: 

Given: An n-vertex k-colorable graph G. 

Output: Progress towards an 0(n a log" n)-coloring ofG for a = a(k) as defined by 
the recursion in equation (6.5), and (3 at most 5.5. 

Let f{n) = n a log" n. 

1. [Base case] Ifk = 2 then just color G with 2 colors. Ifk = 3, then run Algorithm 
Improved-Approx on G. 

2. [Minimum degree] For each vertex v, if d(v) < f(n), make progress Type 2. 

3. [Minimum sharing of neighbors] For each pair of vertices u, v, if\N(u) n N(v)\ > 
n !-«"*- a) , then make progress using Algorithm Sharing-Progress. Note that Algo- 
rithm Sharing-Progress will use Algorithm Multi-Stage-Color recursively on (k-2)- 
colorable graphs. 

4. [Initial distance-2 neighbors] For each vertex v and each pair i, j € [0, . . . , log 1+4 n], 
let G„,ij be the subgraph induced by the set N t (N(v) fl /,-). 

5. [Additional neighbor-taking stages] For each graph G„,,j, run Procedure 
Iterate-neighbors below on input (n,k,G v ij,k — 3). 
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If the algorithm makes progress on any of the inputs given, then halt with success. 
Otherwise, let G x ,...,G q be all the graphs returned by Iterate-neighbors, for 
q = O([(log 1+ , n) 2 *-V*- 5 ). 
6. [Vertex- Cover approximation] Run the BE/MS vertex-cover algorithm on the 
graphs G\,.. -,G q . 

Procedure Iterate-neighbors: (n, k, G', iter) 

Given: Values n and k. An m-vertex subgraph G' of some n-vertex graph G, and 
a number of iterations iter. 

Output: 0([m 2 (\og 1+6 m) 2 ]' ter ) subgraphs of G' or else progress towards an 
0(n a(i) log" (i) n)-coloring ofG. 

PL If iter = 0, then return G'. 

P2. If iter > 1, then run Algorithm Bootstrap on G' and values a = a(k),(3 = (5{k), 

and 6 = 6(k). 
P3. If Algorithm Bootstrap returns progress towards an 0(n a(k) log^ (i) n)-coloring 

ofG, then halt with success. Otherwise, let H ,...,Hm_ x be the subgraphs 

returned. 
P4. Now, for each H h (0 < / < f - 1) for each vertex v in H, and each index 

i,j e [0,...,log 1+6 m]: 

(note: there are at most m 2 (\og l+( m) 2 different 4-tuples (l,v,i,j)) 

(a) Let G, iV ,ij be the subgraph of H, induced by Ni(N(v)nIj), where neighbor- 
hoods are taken within R\. 

(b) Run: lterate-neighbors(n, k,Gi v .j, iter — 1). 

Theorem 6.6 Algorithm Multi-Stage-Color, given any n-vertex k-colorable graph, makes 
progress towards a coloring with O(n a ( k \logn) 5 - 5 ) colors, for a(k) as defined in equa- 
tion (6.5). 

Before proving Theorem 6.6, let us examine the claimed performance more closely. Let 
7(fc) = 1 _p (t - ) - So, equation (6.5) can be written as: 

7(*) = 2--^+7(*-2)(l-^). (6-7) 

One can see from this equation immediately that j(k) < 2 + j(k - 2); that is, if we 
increase k by 2, then 7 increases by less than 2. We can compare this with the sim- 
pler approach from Section 6.1. Algorithm Recursive-Color given there colors A;-colorable 
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graphs with 0(n a '^) colors for a'(k) = 1 - -^ for some constant r. Thus, the quantity 
■y'(k) = jz^j equals k - r and -y'(k) = 2 + -y'{k - 2). Since the function g(x) = ^ is an 
increasing function with x, for algorithm Multi-Stage-Color the exponent a does not rise as 
rapidly as in algorithm Recursive-Color. Thus, the new approach yields better bounds. Be- 
cause Algorithm Multi-Stage-Color is slower than algorithm Recursive-Color, one can achieve 
time/performance tradeoffs by running the faster algorithm with the slower algorithm as a 
base case for some k = k . Table 6.1 at the beginning of this chapter shows the results for 
both algorithms and for various combinations. In particular, for example, we can substitute 
the bound of Theorem 6.6 for k = 4 into the bound of Theorem 6.1 to get the following 
corollary. 

Corollary 6.7 Algorithm Recursive-Color using algorithm Multi-Stage-Color as a base case 
for k = A, colors any n-vertex k-colorable graph (k > 4) with at most: 

colors. 

Proof of Theorem 6.6: 

We may assume k > 3 since otherwise, we just run Algorithm Improved-Approx in Step 

1 of Multi-Stage-Color. Define s k {n) = n'--('- , ) l and let a = a(k) and /3 = (3{k). Steps 

2 and 3 of Algorithm Multi-Stage-Color establish that the graph has a minimum degree of 
n a log' 3 n and that no two vertices share more than s k (n) neighbors. 

Since G is ^-colorable, it must contain an independent set R with D R {V — R) > 
jziD(V - R). So, by Corollary 6.3, one of the graphs G' = G v , itj created in Step 4 
will both have size at least: 

mi = (d m in) 2 /(s k (n) log 7 n) 

- n 2a log 2 " n/(s*(n) log 7 n), (6.8) 

and contain an independent set of at least a Ai = ^(1 - 56) > (^ - 56) fraction of its 
vertices. 1 

We now examine the call to procedure Iterate-neighbors. Suppose Iterate-neighbors is 
called with a graph G" of at least m { vertices containing an independent set of at least a A,- 
fraction of its nodes. By Theorem 6.5, if Step P3 does not halt with success immediately, 
then one of the graphs Hi produced will have both a minimum degree of SrUin ' 1 log" n 



'One can verify that the minimum degrees and the values m, defined satisfy the technical conditions of 
Corollary 6.3 (min degree > max(s(l + S), (3 log n)/8 2 )) and Theorem 6.5 (m, > 1/S 2 ). 
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and contain an independent set R' with D(R') > (A,- - 26)D(V(H,)). Rewriting the latter 
inequality, we have D(R') > (A, - 2S)[D(V(H,) - R 1 ) + D(R% so: 

D R ,(V(H l )-R') = D(R') > T %2L- t D(y{H l )-R'). 

Using the minimum degree bound and degree ratios above, Corollary 6.3 implies that one of 
the sets G, iViiij produced in Step P4(a) will both have size at least m i+x and an independent 
set of at least a fraction A 1+ i of its vertices, where: 

m ,. +1 = 6 4 mfn 2a -\\ogn)^/(s k (n)\og 7 n) 
= ft(m?n 2a - 2 (log n) 2l3 /(s k (n) log 11 n) 
= il(m 2 n 2a - 2 / 5 ,(n)), (for /3 = 5.5) (6.9) 

and A, +1 > j^k-5S 

> -^- - 138 for A,- < 1/2. (6.10) 

Thus, one of the graphs G\ returned to Step 5 of Algorithm Multi-Stage-Color will have at 

least m,k-2 vertices and contain an independent set of size at least A fc _ 2 |V r (G()|, where we 

must now solve for m t _ 2 and A^_ 2 - 

Claim 1: A, > ^ - 4 i+2 S for < i < k - 2. 

Proof: For i = 1 the claim holds. For i > 1, by induction and using equation (6.10), we 

have: 

^ > (d T r-4' +1 ^/( r ^ + 4' + ^)-i3^ 

> (i^fT-2-4' +1 *)/(i^r)-13* 

> ^-2.4' + ^(i^i)-13^ 

> "tr- - 3 • 4 ,+1 <5 - 13S (for i < k - 2) 

> £-. - 4 i+2 <5. (for i + 1 > 2) □ 

So, for 6 - 6(k) = ^r^, we have: 

C/aim 5: m ; = Q(n( 2 ' + '- 2 ) CT • n 2 " 2 ' • [^(n)] 1 - 2 '). 

Froo/: One can easily check that the claim holds for the base case of i = 1, using equa- 
tion (6.8) and the fact that for (3 = 5.5 that log 2 " n > log 7 n. For i > 1, we can check 
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inductively that (6.9) satisfies the claim as follows: 

m i+1 = tt(mln 2a - 2 /[s k (n)]) 

= fi(n< 2,+1 - 2 > 2 « • n* 2 - 2 ") • [s k (n)}^-^ ■ n 2 - 2 /[s k (n)}) 
= Sl{n^-* + V" ■ n 4 - 2 * +1 - 2 • [^(n)] 2 " 2 '" 1 - 1 ) 

- n^'-^.n 2 - 2 ^-^^)] 1 - 2 ^ 1 ) □ 



So, 



m k _ 2 = 0(n( 2fc - 1 - 2 )«-n 2 - 2 ^-[ Sfc (n)] 1 - 2fc - 2 ). (6.12) 



Thus, one of the graphs of Step 5 of Algorithm Multi-Stage-Color will have an independent 
set of at least (| - ^) of its vertices (from equation (6.11)) and have size at least m fc _ 2 , 
as given in equation (6.12). By lemma 4.9, Step 6 will find an independent set of size at 
least m ifc _2/logn. 

Thus, to prove Theorem 6.6 we must just show that m fc _ 2 / log n = tt(n/(n a ( k *> log /3(fc) n)). 
Since /3(k) is set to 5.5 it is enough to have m t _ 2 = il(n 1 - a ^). Equivalently, using equa- 

l-o(t) 

tion (6.12), taking log„ of both sides, and substituting in s k (n) = ra >-»<*-*> , we just need to 
show that: 

l-oW < a W [2--2] + [2-2-] + [ r l^L](l-2-). 
Rearranging terms, this formula is equivalent to: 

[1-aW-l) < 2t " 2 +[ i^( a fc ( ! ) 2) ]( 1 - 2 " 2 )' 



or: 



2 '" 1 - 1 s iSpj + b^K 1 - 2 '-')- 

Dividing both sides by 2 fc " 2 and rearranging one final time, we find that we just need: 

1 1 1 / 1 _ \ 

l-a(Jfc) " ~ 2 k ~ 2 ~ 1 - a(k - 2)\2 k - 2 )' 

But, this formula is exactly the definition of a(k) given in equation (6.5). So Algorithm 
Multi-Stage-Color works as claimed. ■ 



Chapter 7 

Random models for ^-colorable graphs 



While the problem of coloring worst-case fc-colorable graphs seems quite difficult, it turns 
out that coloring random fc-colorable graphs is much easier. In fact, it is well known by 
results of Kucera [23], Turner [38], and others that random fc-colorable graphs can be k- 
colored in polynomial-time with high probability. These results show that, in fact, most 
fc-colorable graphs are easy to fc-color. Dyer and Frieze [18] go further and provide an 
algorithm that when amortized over all rc-vertex fc-colorable graphs, spends polynomial 
time on average per graph. Experimental work on various heuristics for coloring random 
fc-colorable graphs has been done by Petford and Welsh [31]. 

The standard model for a random n- vertex graph is the model Q(n,p) in which each 
possible edge (u, v) is placed into the graph with probability p. This model has the property 
that the distribution Q(n, 1/2) is the same as that obtained by selecting a labeled rc-vertex 
graph uniformly at random from the set of all n- vertex graphs. 

There are several natural models, however, for what one means by a random k-colorable 
graph. Dyer and Frieze examine several and prove relationships among them [18]. We focus 
here on one model that happens to be simplest to analyze, which we shall denote G(n,p, k). 
A graph is selected in Q(n,p,k) according to the following procedure. First each labeled 
vertex is independently assigned to one of k color classes with equal probability 1/k. Then, 
independently for each pair u, v of vertices in different color classes, the edge (w, v) is placed 
into the graph with probability p. We use the notation: 

• G *- G(n,p,k) 

to mean that G is selected according to the distribution defined by this model. 

The Q(n,p,k) model is a natural one for a random n- vertex fc-colorable graph, though 
even for p = 1/2 it is not equivalent to selecting a graph uniformly at random from the 
set of all n-vertex A;-colorable graphs. In particular, graphs that can be ^-colored in mul- 
tiple different ways are over-represented in G(n,l/2,k) since different assignments to the 
color classes may still lead to the same graph. (See Dyer and Frieze [18] for more on the 
relationship between the models.) 

57 
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In this chapter, we consider the problem of fc-coloring graphs in G(n,p,k) for as low an 
average edge density as possible. We present an algorithm to A;-color such graphs with high 
probability for any constant k, and for p > n o(1 ) -1 ; that is, the procedure will work for the 
average degree as low as n c for any fixed e > 0. Before describing that algorithm, however, 
we point out first a quite easy method to &-color G *- G(n,p,k) for p > n _1/2+e (e > 0). 

This idea of the easier procedure is simply this: two vertices from the same color class 
in G will tend to share more neighbors in common than two vertices of different color. Two 
vertices from the same class have an expected n - n/k vertices they might potentially share 
as neighbors, and so can be expected to share n(l - l/k)p 2 neighbors in common. However, 
two vertices from different color classes have only an expected n - 2n/k vertices they may 
share as neighbors, and so they can be expected to share only n(l - 2/k)p 2 neighbors in 
common. For p > n~ 1/2+c , these values are n 2£ (l - 1/fc) and n 2t (l - 2/k) respectively. Since 
for any given pair of vertices x,y, the indicator random variables X v for the event that v is 
a neighbor to both x and y are mutually independent over all v (and each occurring with 
probability p 2 if v is a different color from both x and y), we may apply ChernofF bounds. 
In particular, if X = £X„, and /z = E[X] is the expected number of neighbors in common 
between x and y, Chernoff bounds state that for any 8 > 0, 

Pr[X < (1 - 6)n or X > (1 + <$)//] < 2e~ 6 ^l 3 

(See Angluin and Valiant [2]). For /z = 0(n 2f ), this probability is so small that even when 
summed over all pairs of starting vertices x, y, the probability any pair shares a number of 
neighbors that differs by more than Sfi from the expectation is o(l). 

One thus finds that with high probability, all pairs of vertices selected in the same color 
class share n 2f [l - l/fc](l + o(l)) neighbors and all pairs of vertices of different color share 
only n 2£ [l - 2/fc](l + o(l)) neighbors in common. Thus, one can easily algorithmically 
separate the color classes. 

7.1 An improved algorithm 

In this section, we describe an algorithm based on an extension of the above idea that k- 
colors graphs in Q(n,p,k) for much lower values of p. The results presented here are based 
on work joint with Joel Spencer. 

Another way to view the above observation is that vertices of the same color will have 
more paths of length 2 between them than vertices of different colors. This idea can be 
extended to paths of a longer constant length / for improved bounds. If / is even, it turns 
out (see Section 7.1.1) that the expected number of paths of length / between two vertices 
of the same color is higher than the expected number for vertices of different color. If / is 
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odd, the reverse holds. The difficulty in analyzing the case / > 2, however, is that the events 
corresponding to the paths of length / between two vertices are no longer independent. 
Different paths of length 2 between two vertices x and y share no edges in common, but two 
paths of length 3 might share an edge: for example, consider the two paths (i, w, w', y) and 
(x,w',w,y). So, to prove that the number of paths will be with high probability close to 
the number expected, one needs a more sophisticated probabilistic analysis. Luckily, such 
analysis for a general class of this type of problem has already been provided by Spencer 
[35] in the context of the random graph model G(n,p). It turns out that the same analysis 
holds for the Q(n,p,k) model as well. 

7.1.1 Calculating expectations 

Let / > 2 be some integer constant and let us fix two vertices x and y. By a "path" we 
will always mean a simple path; that is, one that never touches any vertex more than once. 
In this section, we calculate the expected number of paths of length / between x and y in 
Q <_ g{n,p, k) and show this expectation differs by a constant factor depending on whether 
or not x and y are in the same color class. 

In particular let E,(p) be the expected number of paths of length / between x and y in 
G <- Q(n,p,k), and let £, same (p) and Ef H (p) be the expected number of such paths given 
that x and y are chosen in the same or in different color classes respectively. Also, for p > 
let Xi(p) = [£ ( same (p) - Ef B (p)]/Ei(p). When p is clear from context, we will just write 
E h Ef axac ,Ef a , and A, for the above quantities. We show now that for constant k and /, 
the value A ( is bounded away from by a constant. 

We can calculate the expected number of paths between x and y by fixing some arbitrary 
sequence of distinct vertices (also distinct from x and y) v 1 ,...,v,_ 1 and calculating the 
probability of the event B t that each pair (x,v 1 ),(v 1 ,v 2 ),. . .,{vi- 2 ,vi-i),{vi-i,y) consists 
of vertices chosen in different color classes. Given that the event B t occurs, the probability 
the path (x,Vi, . . .,v/_i,J/) appears in G is simply p'. Given that Bi does not occur, the 
probability is 0. Since there are (n - 2),_! = (n - 2)(n - 3) ■■•(«-/) = "'"H 1 ~ °( l )\ 
possible such sequences v u . . ., V/_i, the expectation E, is simply [1 - o(l)]p'n' -1 Pr[5,]. 

For any random variable X, let Pr same [X] and Pr^fX] be the probability that event X 
occurs given that x and y are in the same color class, or given that a; and y are in different 
color classes, respectively. Thus, we have: 

E, = E,(p) = [1 - (l)]A' _1 Pr[i? ( ], (7.1) 

£ ( ^ = £Hp) = [1 - o(l)]p'n'- 1 Pr 8ame [fl / ], (7.2) 

Ef* = E? iff (p) = [l-oClJlp'n'^Pr^,]. (7.3) 
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Also, since the p'(n - 2),_x terms factor out of the expression for A,, we have: 

A ' = A ' (P) = Pr[^ (7 ' 4) 

So, to compute E,, Ef aiae , E^, and A,, we need only examine the fixed sequence v u . . . , w,_i 
and the event that all are chosen colors such that the path {x, v u . . . , «,_!, y) is a "potential 
path" in the graph. 

The value Pr[5,] is quite easy to calculate: each vertex in the path has a (1 - \) 
probability of being given a different color than the preceding vertex. Thus, 

Pr[2?,] = (1-|)'. (7-5) 

Also, clearly, Pr[B,] = Pr same [5,] • Pr[x and y are chosen the same color] + Pr d ' ff [£,] • 
Pr[a; and y are chosen of different color]. So, 

Pr[5,] = iPr same [i?,] + (1 - i)Pr diff [5,]. (7.6) 

So, from equations (7.5) and (7.6), in order to calculate Pr same [B,] and Pr dlff [5,] it suffices 
to prove the following theorem. 

Theorem 7.1 Pr same [fl,] - Pr^B,] = (-l)'(i)'" 1 . 

Proof: Define the following events A t and B t for t < I. Notice that this definition of B t 
coincides with the previous definition of B\ for t = I. 

• For t < I, let A t be the event that each pair (x, v x ), (vi,v 2 ), . . . , (v t _ 2 , w^-i) consists of 
vertices chosen in different color classes. 

• For t < I, let B t be the event that A t occurs and in addition vertex v t -i is chosen in 
a different color class from y. 

Also, for convenience, let A t - B t be the event that A t occurs and B t does not. Notice 
that since B t C A t , we have Pr[A t - B t ] = Pr[A t ] - Pr[B t ]. Also note that event A t does 
not depend on whether x and y are chosen in the same or different color class. 

The probability of event A t is easy to calculate: we just need v x a different color from 
x, v 2 a different color from w l5 and so on up to u<_i. Thus: 

Pr[A t ] = (1-1/A) 1 - 1 . (7.7) 

For t = 2, event B 2 is the event that v x is a different color from both x and j/, so Pr same [i?2] = 
1 - 1/k and Pr d,f[ [B 2 ] = 1 - 2/k. For < > 2, event B, occurs if either: (1) fl«_i occurs anJ 
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v«_! is of one of the k - 2 colors not used in y or v t _ 2 , or (2) event A t -i - -S ( _i occurs and 
w t _i is of one of the k - 1 colors not used in y or u t _ 2 . Thus, 

Pr same [B t ] = Pr ,une [B ( . 1 ](l - 2/Jfc) + (Pr[A t _i] - Pr 8ame [J5 ( _ x ])(l - 1/k) 

= (1 - 1/Jfc)'- 1 - iPr" nu! [5 t _ 1 ]. (7.8) 

Similarly, Pr^B,] = (1 - l/*)'" 1 - iPr^.j]. (7.9) 

Thus, we can solve for Pr same [£,] as follows. 

Pr same [£?,] = (1 - 1/k)'- 1 - i [(1 - 1/fc)'- 2 - ±Pr same [5,_ 2 ]' 

= (1 - 1/it)'- 1 - |(1 - 1/k)'- 2 + £ [(1 - 1 A)'- 3 - |Pr same [5,_ 3 ]] 

= (1 - 1/fc)'" 1 - 1(1 - 1/k)'-' + i(l - 1/k)'- 3 - 

••• + (-l)'" 2 (i)'" 2 P'- same [5 2 ]. (7.10) 

Similarly, 

Pr diff [£,] = (1 - 1/zt)'- 1 - 1(1 - 1/k)'- 2 + ... + (-1)'- 2 (|)'- 2 Pr diff [2? 2 ]. (7.11) 

From equations (7.10) and (7.11), we have: 

Pr 8ame [£ ; ] - Pr^iB,] = (-l)'- 2 (^)'- 2 [Pr B,,ine [5 2 ]-Pr diff [fl 2 ]] 

= (-l)'(£)'- a [(l-l/*)-(l-2/A)] 
= (-l)'(i)'- 1 . (7-12) 

Thus, we have proven the theorem. ■ 

By Theorem 7.1 and equation (7.4), we have A, = [(-l)'/fc' -1 ]/Pr[.B,], so: 

A, > (-l)'A'- 1 - (7.13) 

Thus, for / and k both constant, we have A, bounded away from by some constant > 0. 
Also, note by equations (7.2) and (7.3) that for p > n~ l+t for some constant e > 0, for 
sufficiently large integer / (in fact / > |"2/e"|), we have £, same , E? a = fi(n). 

7.1.2 Analysis and the /-path algorithm 

Note the following property of paths of constant length / between fixed vertices x and y. 
The number of edges in the path divided by the number of "non-rooted" vertices (that is, 
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vertices not including x and y) is //(/ - 1). For any proper subgraph S of such a path, the 
quantity: \E(S)\/\V(S) - {x,y}\ is strictly less. Because this ratio of edges to non-rooted 
vertices is strictly less for all proper subgraphs, we say that paths between x and y are 
"strictly-balanced". (A definition of "strictly balanced" for more general "rooted graphs" 
appears in Definition 8.3 of Chapter 8.) 

Spencer [35] proves that for any such strictly balanced graph and any constants 6, c > 0, 
if the expected number of copies of the graph in G <- G(n,p) is at least K\og n for sufficiently 
large K, then the actual number of copies of the graph in G <- G(n,p) will be within {1 + 6) 
of the expectation with probability 1 - o(n~ c ). In Appendix B, we prove a slightly weaker 
(and simpler to prove) analog of Spencer's theorem for the model G(n,p, k). A special case 
of the analog is the following. Let Num,(G) be the number of paths of length / between x 
and y in G. 

Corollary 7.2 (Corollary to Theorem B.2) For any constants S,c> 0, if I and p are 
such that K log n < J5, same (p) , Ef\p) < n e ' for sufficiently large K and sufficiently small 
e* , then for G <— G(n,p,k): 

1. p r same [(1 - 6)Ef ame < Num^G) < (1 + 6)E? aitte \ > 1 - o{n~ c ), 

2. Pr diff [(l - 6)E? if{ < Num,(G) < (1 + 6)E? ifI \ > 1 - o( n - c ). 

So, if the expected number of paths is sufficiently large, but not too large, then we can be 
assured that with probability 1 -o(n~ 2 ), the number of paths between x and y will be close 
to the expectation. 1 

For convenience, since A; < 1, define constant e' > so that: 

E, e[n c ',2n c '] => £, same , £, diff € [/Hog n, n f ']. (7.14) 

By equation (7.13), for / constant, the value |A,| is at least some constant greater than 0. 
Also, as noted at the end of Section 7.1.1, for p = n _1+£ for any constant e > 0, there exists 
integer / such that Ef ame , Ef m = il(n). We want / such that E, <E [n c ' ,2n c '] but such an 
integer / might not exist. We can handle this difficulty by noting the following fact. 

Fact 7.1 LetG q (n,p,k) be the model such that we first select G <— G{n,p,k) and then delete 
each edge with probability q. Then, G q (n,p,k) = (/(n,p(l - q),k). 

That is, if we delete each edge in graph G <- G{n,P, k) with some probability q, then the 
distribution obtained is exactly the same as if we had just put each edge into the graph with 



'In fact, the restriction that Ef ame , £ dlff < n £ * is most certainly not necessary ; We leave for future work 
to show that Spencer's theorem goes through for the expectation greater than n c as well. 
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probability p(l - q) in the first place. So, given a graph G <- Q(n,p, k) and a value / such 
that E, > 2n e ', if we delete edges at random from G with probability q so that the expected 
number of paths between x and y is between n 1 ' and 2ra £ ', then we can apply Corollary 7.2 
to the resulting graph. We now present the algorithm /-path. 

Algorithm /-path 

Given: An n-vertex k-colorable graph G. 
Output: A k -coloring of G or else failure. 

1. Let d avg be the average degree in G and let p = n ^"i/ k y 

(So, ifG <- Q(n,p,k) for p = n~ 1+e then with high probability, p = p[l + o(l)].) 
Pick I such that p l n l ~ l = fi(n) and let A = 1/fc' -1 . 

2. Randomly delete each edge in G with probability q so that Ei(p(l - q)) e 
[|n £ ',|n e '] where e' is such that Corollary 7.2 holds for c = 2 and 6 = A/4, 
and E t (p(l - q)) is calculated using equations (7.1) and (7.5). 

Let Ef ame = £, same (p(l - q)), calculated using equations (7.2) and (7.10). 

3. For z = 1 to k do: 

(a) Pick an arbitrary uncolored vertex x and let Si be the set containing x and 
all vertices with a number of paths of length I to x in the range: 

[(1 - A/3)£, same , (1 + A/3)£, same ]- 

If the set Si is not independent or 5, contains previously-colored vertices, 
then halt with failure. 

(b) If Si is independent, then assign color i to all vertices of Si. 

4. If in Step 3 we assigned one of k colors to each vertex in the graph, then halt 
with success. If we did not color each vertex, then halt with failure. 

Theorem 7.3 Algorithm /-path k-colors graphs G <- Q(n,p,k) with high probability for 
p > n _1+e for any constant e > 0. 

Proof: Let C x ,. ..,C k be the sets of vertices in each color class in the creation of graph 
G in model Q(n,p,k). Let us say that Step 3 succeeds in iteration i if the set S t created 
equals Cj for some 1 < j < k. 

In step 1, as noted, with high probability p - p[l + o(l)], and let us for convenience 
assume now that this is the case. So, E,(p(l - q)) = [1 + o(l)]E,(p(l - q)) and £f ame (p(l - 
q)) = [1 + o(l)]£, same (p(l - q)). Let £ ; same = £ ( same (p(l - q)), let E? a = Ef lH {p(l - q)), and 
let E, = E,(p(l - q)). 
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In Step 2, if E,(p(l - q)) € [|n f ',|n £ '], then E, € [n £ ',2n £ '] and Corollary 7.2 applies. 
Thus, by Corollary 7.2 and since 6 is chosen sufficiently small so that |£:, same - £, diff | > 26, 
we have the following. With probability 1 - n 2 [o(n,- 2 )} - 1 - o(l), for every pair of vertices 
x,y in the same color class Cj, and for no pairs x,y in different color classes, the number 
of paths of length / between x and y is in the range [(1 - 6)E? ame ,(l + 6)Ef ame ]. Thus, 
with high probability, Step 3 succeeds for each iteration i and Algorithm /-path fc-colors the 
graph. ■ 

Since I is a constant, counting the number of simple paths of length / between two 
vertices can be done in polynomial time and so the /-path algorithm runs in polynomial 
time. The running time of the algorithm could be improved considerably by counting non- 
simple paths as well as simple paths. It is likely that the bounds claimed by Corollary 7.2 
can be made to apply for that case as well. 



Chapter 8 

Semi-random graphs 



The results of Turner [38] and Dyer and Frieze [18] mentioned in the last chapter show 
that random fc-colorable graphs, and thus most fc-colorable graphs, are easy to A:-color. 
Random fc-colorable graphs, however, tend to be of a very special type. For instance, with 
high probability all vertices have nearly the same degree and all have nearly the same 
number of edges to each of the other (k - 1) color classes. So, graphs created in only 
a "somewhat random" manner may not be colored well by algorithms for Q(n,p,k). On 
the other hand, worst-case assumptions may be overly pessimistic in many situations. To 
analyze the coloring of graphs in an intermediate range, we consider here two new graph 
models that he in between the random and worst-case models. These new models provide a 
smooth transition between the random and worst-case scenarios and are based on a notion 
of a "semi-random source" from the cryptographic literature. We will call these models the 
"semi-random" graph models. 

8.1 Basic definitions and statement of results 

We define here two graph models both based on the semi-random source (also called a 
"slightly-random" source) of Santha and Vazirani [34] (see also [40] [39] [17]). In the first 
model, which we denote Q s (n,p,k), the graph is generated as follows. First, an adversary 
splits the n vertices into k color classes (for k = 3, we denote these classes by red, blue, and 
green). Then for each pair of vertices u,v where u and v belong to different color classes 
(running through such pairs in an order of its choosing), the adversary decides whether 
or not to include edge (u, v) in the graph. Once the adversary has made a choice for a 
particular edge (u, v), the choice is then reversed with probability p. Note that later choices 
of the adversary may depend on the outcomes of earlier decisions, as in the Santha- Vazirani 
source [34]. An alternative way to view this model, and closer to the point of view used by 
Santha- Vazirani is the following. For each pair of vertices u, v belonging to different color 
classes, the adversary picks a bias p uv between p and 1 - p of a coin which is then flipped 
to determine whether edge (u, v) is placed in the graph. The adversary may determine the 
bias p uv based on the outcome of previous coin tosses. The two views of the model are 
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equivalent: if the adversary in the first description is deterministic, then it can be thought 
of as selecting p uv G {p, 1 - p}; if it is randomized, it can act as if selecting intermediate 
values. We call p the noise rate of the source and this model, the semi-random graph model. 
The second model we consider is a slightly modified version of the above, differing in 
that the sizes of the k color classes are required all to be fi(n). We call this second model the 
balanced semi-random graph model and denote it by Q SB (n,p,k). Following the notation in 
Chapter 7, we write: 

• G^G s (n,p,k) or G *- G SB {n,p,k) 

to denote that G is selected according to the corresponding model for some unknown adver- 
sary. We denote the semi-random and balanced semi-random models for a fixed adversary 
A by Qf(n,p,k) and Qf B {n,p,k) respectively. Formally, we say that an algorithm ^-colors 
G <- G s (n,p,k) with high probability (or <-colors G <- GsB(n,p,k) with high probability) 
if it does so with high probability for any choice of the adversary. 

A nearly equivalent way to view the semi-random models is that each edge between 
vertices of different color classes is actually placed into the graph with probability exactly 
p, and then an adversary may elect to place additional edges into the graph if it so chooses. 
This version is perhaps conceptually the most elegant, and an adversary in this version 
can simulate the adversaries in Q s (n,p,k) and GsB(n,p,k), though the converse does not 
hold. For example, the adversary here could make the graph a complete fc-partite graph if 
it so desired; also, the adversary here may make its decisions after all coin tosses have been 
performed. While this version could conceivably be more difficult for coloring algorithms 
than the semi-random graph models as defined above, all the algorithms presented in this 
chapter work under both conditions. 

The semi-random models separate the algorithms for coloring random fc-colorable graphs 
into two categories. Some of the algorithms for the random model [18][23] are highly 
dependent on facts such as the edge probabilities all being equal and are easily defeated 
by a semi-random source. Others, such as Turner's No-Choice algorithm [38] adapt well 
to the semi-random model. In particular, Turner's bound of p > n~ 1/k+c for fc-coloring 
G <— G(n,P, k) holds in the balanced semi-random model as well. 

We present first in Section 8.2 an algorithm that achieves the same bound as Turner's 
algorithm but with significantly simpler analysis (and for 3-colorable graphs holds in the 
slightly more general Gs(n,p, 3) model). We then, in Sections 8.3 and 8.4, present an al- 
gorithm with better bounds for the balanced model. This algorithm 3-colors graphs in 
GsB(n,P, 3) with high probability for p > n~° 6+ % and more generally for fc-colorable graphs 
works for p > rau*+o*- 2 J +e . The algorithm of Sections 8.3 and 8.4 requires a more involved 
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analysis, and the use of the Janson inequality for estimating probabilities of "almost" inde- 
pendent events. In Section 8.5 we present some relationships between the coloring problems 
in the balanced and unbalanced semi-random models. 
For convenience, we make the following definition. 

Definition 8.1 Let G «- Q s (n,p,k) or G <- G SB (n,p,k). The pair (u,v) is a potential 
edge in G if u and v belong to different color classes in the adversary's color scheme. 

For a subgraph H of G or a subset U of V(G), we will use colors(H) and colors(U) to 
denote the set of color classes of G that are represented in the subgraph or subset. 

8.2 A first algorithm 

We now consider the models (/ 5 (n,p,3) and GsB(n,p,3) of a 3-colorable graph generated by 
a semi-random source. Although for small constant noise rates p, say p = 0.01, it appears 
at first that the adversary has a good deal of power to defeat a coloring algorithm, it turns 
out that it does not. As previously mentioned, Turner's algorithm [38] actually 3-colors 
such a graph with high probability for any p > n _1 / 3+£ for constant e > 0. 

We present first a different algorithm that achieves the same bound as Turner's, but 
works for the unbalanced case Q s (n, p, 3) as well and has a much simpler analysis. We 
then present a straightforward improvement and a natural extension of this algorithm for 
fc-colorable graphs (for constant k) for the balanced model. 

The idea for the simplest algorithm is the following. If in the adversary's color scheme 
u G blue and v 6 green, then the shared neighborhood S - N(u) D N(v) contains only red 
vertices. Thus, N(S) C blue U green. For p > n _1/3+£ we show that with high probability, 
N(S) actually equals the entire set blue U green. So, given u and v, one can split G into a 
2-colorable portion N(S) and an independent portion V — N(S) and thus 3-color the graph. 

Algorithm Two-Stage 

Given: A graph G = (V, E). 

Output: Either a 3-coloring of G or failure. 

1. First try to 2-color G. If that works, halt with success. Otherwise, do the 
following: 

2. For each pair of vertices u,v (think of u as a candidate green node and v as a 
candidate blue node), 

(a) Let S = N(u) n N(v). 
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(b) Let T = N(S). 

IfT is 2-colorable and V - T is an independent set, then color T blue and 
green, color V -T red and halt. Otherwise go to the top of the loop with a 
different pair u,v. 

3. If Step 2 did not succeed for any pair u, v, then halt with failure. 

Theorem 8.1 (weak version) Algorithm Two-Stage 3-colors G <— Gs(n,p, 3) with high 
probability (over the coin tosses of the semi-random source) for p > n~ l/3+e and constant 
e>0. 

Proof: For convenience, let red be the color with the most vertices in the adversary's 3- 
coloring. If there are either no blue or no green vertices, then we will 2-color the graph at the 
start. Otherwise, let u be a green vertex and v a blue vertex (in the adversary's 3-coloring). 
Then, the set S = N(u)C\N(v) contains only red vertices and so set T = N(S) C blueUgreen. 
We now prove that with high probability, for p > n _1/3+f , set T contains all the blue and 
green vertices. 

If we view the semi-random source as choosing biases p uv G [p, 1 - p], then the sizes 
of sets S and T are minimized when each p uv equals p. In that case, every vertex in 
red independently has a probability p 2 of belonging to S. So, using Chernoff bounds, 
\S\ > ||red|p 2 = Q,(np 2 ) with high probability. Now, each vertex z G blue U green such 
that z £ {u,v} has a probability (1 - p) |s| of not belonging to T. The reason is that for 
z £ {u,v}, for each w £ red, the events A z>w that edge (z,w) appears in the graph occur 
with probability p and are independent of each other and of the choice of S. So, we have: 

Pr[z g T] < e- p|51 - e- n (" p3 ) = e^"^ - o(l/n). 

That is, with high probability all vertices z € blue U green belong to T. Thus, with high 
probability, T = blueUgreen and V — T — red and so for some pair u, v considered, algorithm 
Two-Stage succeeds. ■ 

Note that if the sizes of the color classes are roughly balanced, we can speed up Algorithm 
Two- Stage considerably by choosing the vertices u and v at random. For instance, if the 
sizes of the color classes are all within constant factors, then we have a constant probability 
of selecting two "good" vertices each time. 

Algorithm Two-Stage fails when p falls below n _1/3 because then the vertices u G green 
and v G blue may not share enough neighbors for N(S) to equal blue U green. However, for 
p below n _1/3 , set S might still contain many vertices, and applying additional iterations 
of the neighbor-taking process can then boost its size if the sizes of the blue, green, and 
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Figure 8.1: Vertices u and v and sets S R and S G . 

red vertex sets are roughly balanced. In fact, we can consider the following straightforward 
extension of Algorithm Two-Stage that works in the balanced semi-random model. (See 
Figure 8.1.) 

Algorithm f-Stage 

Given: A graph G = (V, E), and integer t. 
Output: Either a 3-coloring of G or failure. 
For each edge (u,v): 

1. Let S l G = {u}, S B = {v}, and S R = N(S G ) n N(S B ). 

2. Let S 2 G = N(S B ) n N(S R ), S% = N(S G ) n N(S R ), and S% = N(S G ) n N(S%). 

3. Let S 3 G = N(S 2 B ) n N(S R ), S% = N(S G ) n N(S%), and S 3 R = N(S 3 ) n N(S 3 ). 

t. Let T = NiS^ 1 ). 

IfT is 2-colorable and V -T is an independent set, then color T blue and green, 
color V — T red and halt. Otherwise go to the top of the loop with a different 
edge (u,v). 

If we have not succeeded in Step t for any edge (u, v), then halt with failure. 

For the balanced model, we have the following stronger version of Theorem 8.1. 

Theorem 8.1 (strong version) Algorithm f-Stage will 3-color G <— £sb(",P>3) with high 
probability for p > n _1/,2+e , e > constant, and t > log 3 (l/e). 
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Algorithm f-Stage is, in fact, very similar to Turner's No-Choice algorithm, and his algo- 
rithm should achieve this stronger bound as well for £sb(™,7>, 3). The algorithm presented 
here, while more complicated an algorithm, is easier to analyze. However, because we will 
demonstrate an even better algorithm in the next section, we give here just a proof sketch 
showing that for t = 3, the algorithm will 3-color G <- Gsb(ti,p,3) for p > n _5/11+£ , and 
more briefly describe how this is extended. 

Proof sketch: Again, if u is green and v is blue then for all i, S l G C green, S' B C blue, 
and S' R C red, and T C blue U green. For the given value of p, with high probability there 
will exist an edge between two such vertices u and v. Also, the sizes of the sets S' G , S' B , S' R 
and T are minimized by the semi-random source that chooses each p uv to equal p. One 
final fact to note is that for each i > 1, S G C S G +1 , S B C S B + \ and S R C 5^ +1 . The 
general argument now is just repeated application of bounds for large deviations, being 
somewhat careful about independence. For this proof sketch, we focus on the case where 
t = 3 and show that algorithm i-Stage will 3-color G <- 6 S B(n,P,3) f°r p = n~ 5 ' n+t with 
high probability. Recall that for G <— ^sb("»P 5 3) the sizes of the sets red, blue, and green 
are all 0(n). 

We can imagine that the coin deciding the presence of an edge is not nipped until we 
actually examine that edge. So, we first examine all edges (u,w) and (v,w) for w G red 
and find that almost surely \S R \ = 0(|red|p 2 ) = Q(np 2 ). Next, for each z G green, we 
examine the edges (z, w) for w £ S R and the edge (z, v). For z / u, these are all previously 
unexamined edges. So, for z G green - {«} we have: 

Pr[ze s G ) = Ki-(i-p) 15 " 1 ) 

= ^1^1(1 + 0(1)). (using P \S R \ = o(l)) 

Thus with high probability, \S G \ = 0(|green|np 4 ) = 0(n 2 p 4 ) and similarly we have \S B \ - 
0(n 2 p 4 ). Now, for each z G red — S R we examine the edges (z, w) and (z, w') for w G S G -S G 
and w' G S B - S B . Again, these are all previously unexamined edges, so the same argument 
as above shows that the probability z belongs to S R is proportional to p 2 \S G - S G \ \S B — S B \. 
Thus with high probabihty, \S R - S R \ = 0(n 5 p 10 ). Finally, we have T = N(S%). Notice 
that set T contains S G U S B and that for each vertex z G (blue U green) - (S G U S B ), we 
have not yet examined the edges (z, w) for w G S R - S R . So, for each such vertex z, 

Pr[z<£T] < (l-p)l s «- s Hl 

= (l-p) e (" V °) 
< e-s(nV') 

= o(l/n), forp = n- 5/11+£ . 
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So, with high probability, T = blue U green. 

More generally, for p = n~- 1/2+f , if |5}j| = Q(n x 'p 2x ') we will have with high probability 
that |5jj +1 | = ©(n^+'p 2 *^ 1 ) for x i+1 - 3x, + 2, so long as n Xi + 1 p 2xi+l = o(n). Since we begin 
with \S]f\ = 0(n 2f ) and at each step the size of S' R more than triples, we can continue with 
the assumption that n I,+l p 2r ' +1 = o(n) for at most log 3 (l/e) iterations. Suppose i and p are 
such that n Xi+1 p 2x ' +l ^ o(n). Since the probability that Algorithm i-Stage succeeds can only 
increase with larger p, we may for purposes of analysis decrease p so that jf'^p 21 '* 1 = y/n. 
We will thus have for z € (blue U green) - (S B U S' G ) that: Pr[z £ T] < (1 - p) ^ < 
e -0(v^ P ) _ (i/ n ). go, set T equals blue U green with high probability. ■ 

It is interesting to note that algorithm Two-Stage (or f-Stage) extends naturally to 
graphs in GsB(n,p,k) for constant k > 3. The idea is that instead of selecting two vertices 
u, v at the start, to select k - 2 sets: U 2 ,...,U k -u eac h ^ °f * vertices, at the start. 
For some such choice, the vertices of U k - X are all of different colors in color s(G) and the 
vertices of f7 fc _ 2 are all of different colors in colors(U k -\) and so forth. (That is, the vertices 
of (7,_i are all of different colors, and each is of a color used in [/,-.) So, for T k - V(G) and 
Ui = {uj , . . . , u)}, for each i 6 {2, . . . , A;} simply let 

T i - 1 = N Ti (N T Xu 1 i )n...riN Ti (vi)), 

where N Tt (X) = N(X) n T^. With high probability, for p > n- 1/k+e , we will be able to 
assign one color to each set T, - T|-_i for i > 3 and two colors to the set T 2 , and thus fc-color 
the graph. This yields the same bounds as those achieved by Turner. Again, we will not go 
into the analysis in detail because in the next section, we show how a quite different idea 
can be used to get even better bounds. 

8.3 A better algorithm: k = 3 

We now describe a different style of algorithm that improves upon the above bound in 
the balanced case, to 3-color graphs Gsi)(n,P, 3) with high probability for p > n~° ' 6+f . 
The algorithm, while quite simple, requires a more involved probabilistic analysis than 
the previous one. In particular, we will need to use the Janson inequality [11] to bound 
probabilities of "nearly" independent events based on pairwise dependencies. 1 

The algorithm is based on the following simple observation. If in a 3-colorable graph G 
there are two vertices x and y both adjacent to a pair of vertices u and v that are adjacent 
to each other, then x and y must be the same color in any legal 3-coloring. We call the 
subgraph induced by {x,u,v,y} a link between x and y. (See Figure 8.2). 



'The results described in this section and Section 8.4 are based on work joint with Joel Spencer. 
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Figure 8.2: A link between x and y. 

At first glance, it would seem the above observation does not help, since for fixed vertices 
x and y, the probability there exists a link between x and y is at most 0(n 2 p 5 ). (There 
are 0{n 2 ) possible pairs (u,v) and for each pair the probability all necessary edges exist 
is p 5 .) Thus, the probability there is a link between x and y is much less than 1 even for 
p = o(n- 0A ). 

The key fact to note, however, is that we do not need a link between every pair of, say, 
red vertices x and y. All we need is that for each such pair there is a sequence of links 
between x and some x', between x' and some x", and so forth, until eventually at some 
point we reach y. We will call such a sequence a "chain". 

Another way to think of this observation is that given a graph G we can create a new 
graph H as follows. The vertex set V(H) equals V(G), and if x and y are connected by a 
link in G, we put an edge between x and y into H . So, while edges in G exist only between 
vertices of different color, edges in H exist only between vertices of the same color (in G). 
The "key observation" is then just that in order to easily select the set of red vertices in G, 
we do not need red to be a clique in H, just a connected set. So, the simple algorithm is as 
follows. 

Algorithm Chain 

Given: A graph G = (V, E). 
Output: A 3-coloring of G or failure. 

1. Create graph H = (V,F), where 

F = {(#, y) | 3 a link in G between x and y}. 

2. Find all connected components in H. If there are exactly three, halt with success, 
producing as output the vertices of the three components labeled as red, blue, 
and green. 

Otherwise, if there are more than 3 components, then halt with failure. 
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8.3.1 Motivation 

As mentioned above, each connected component in graph H produced by Algorithm Chain 
consists of vertices that must be the same color under any legal 3-coloring of G. The 
following sections contain a proof that when p > n _0 6+ % with high probability there will 
be only 3 such components in H. Let us first, however, give a motivational argument, 
supposing that each edge between two vertices of the same color were placed independently 
with the same probability into H . 

Let p - n~ 06+t and for simplicity, assume that e < 1/5. Given two vertices x and 
y of the same color (say red) in G, the expected number of links between x and y in G 
= Q(n 2 p 5 ) = Q(n~ 1+5c ). For e < 1/5, the probability there exists a link between x and y, 
and thus the probability that x and y are are neighbors in H is 0(n _1+5f ) as well. Thus, if 
we consider the subgraph in H induced by the set red, the average degree of each vertex is 
0(ra 5c ). It is well known that in the random graph model Q(n,p), once the average degree 
exceeds Klogn for sufficiently large K, the graph is connected with high probability. So, 
if the edges in the red subgraph of H were placed randomly, the red set would be a single 
connected component almost surely since n 5e > K log n. 

8.3.2 Janson's inequality 

Janson's inequality [11] is used in the following setting. Consider a universe U of points 
and a collection of subsets X 1? . . . , X m of U. We now create a new subset S of U by placing 
each j € U into S independently with probability p. Let A; be the event that X; C S. 
Janson's inequality bounds the probability that no set X, is contained within S: that is, 
the probability that no event A,- occurs. 2 
Define: 

m 

. M = JlPr[Ai]. 

i = l 

If the Xi were all disjoint, then the events A; would all be independent and so M would 
be the probability that no A, occurs. If the Xi are not disjoint, then the events A; are 
not independent. However, Janson's inequality allows us to bound the probability no Xi is 
contained in S by looking only at pairwise dependencies. In particular, Janson's inequality 
states that: 

M < Pr[no X,- is contained in S] < Me^* (8.1) 

where A is an upper bound on Pr[A,] and we define: 



2 Janson's inequality works even if the probabilities for each point j are different, so long as the points 
are placed into S independently. We will not need this fact for our purposes. 
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• A = ^2 Pr t A « and A i\- 

ordered pairs (.^j) 

Notice that if A < 1/2 and A = o(l), then by equation (8.1), Pr[no X, is contained in S] 
— M(\ + o(l)). That is, under these two conditions, the probability is within 1 + o(l) of 
what the probability would be had the A been independent. 

In the study of random graphs Janson's inequality is often used to show that some 
structure exists with high probability. For example, suppose one wishes to prove that the 
graph G <— G(n,p) contains a triangle with high probability for p > 1/n. For such a 
setting, we let U be the set of all edges of the n-clique K n (thought of as possible edges 
in G) and have one X t for each set of three edges corresponding to a triangle. Janson's 
inequality then provides an upper bound on the probability that no triangle is contained in 
G. In the setting of this thesis, we will use Janson's inequality to prove that in the balanced 
semi-random model, for sufficiently large noise rate p, for any x,y G red there will be a 
chain between x and y with probability at least 1 — o(n~ 2 ). 

The following definitions are taken (roughly) from Spencer [35]. 3 

Definition 8.2 Let H be a graph in which some subset R of its vertices are specified to be 
"roots" and H has no edges between vertices in R. We will call the pair (R, H) a rooted 
graph, or simply say that H is a rooted graph when R is implicit. Define edges(H) to be 
the total number of edges in H and nonroots(H) to be the number of vertices in H excluding 
roots. Define the density of H to be dens(H) = edges(H)/ nonroots(II). 

We will always consider rooted graphs to be graphs on a constant number of vertices, and 
examine the number of copies of such graphs in larger n- vertex graphs. 

Definition 8.3 A rooted graph (R,H) is strictly balanced if for some constant e' > 0, 
for every proper subgraph (R, H'), we have dens(H') < dens(H)-e'. (By a proper subgraph, 
we mean that H' C H .) 

Definition 8.4 Suppose (R,H) is a rooted graph and G = (V,E) is a graph with V D R. 
An image of H over R in G is a subgraph of G isomorphic to H by a map which is the 
identity on R. When R is clear from context, we will drop the phrase "over R". 

So, for example, if H is a triangle with a root vertex x, then the images of H over {x} in 
G are all triangles in G containing vertex x. 



3 The term "image" used here is essentially the same as an "extension" in Spencer's paper, except he 
counts maps while we count images of maps. 
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Definition 8.5 Suppose (R,H) is some strictly balanced rooted graph and V is a set of n 
vertices containing R. Let X u .. .,X m denote all distinct images of H in the clique K n on 
V . That is, the X, are all possible images of H fixing root set R in an n-vertex graph. For 
some model M (such as G(n,p) or Q SB (n,p,k)), we define: 

A(H,M) = Y, Pr[XiCG andXjCGlGi-M]. 

ordered pairs i& 

Spencer [36] [35] proves the following theorem for random graphs Q(n,p). 

Theorem 8.2 (Spencer) Let (R,H) be a strictly balanced rooted graph on a constant 
number of vertices with v = nonroots(H) and e = edges(H). Then there exists e* > so 
that ifp < n- v / e+f *, then A(H,Q(n,p)) = o(l). 

Spencer then uses this fact to prove that with high probability, for p = n~ v/e+€ ' , G will 
contain some image of H . 

We can use Spencer's theorem to prove the following. 

Theorem 8.3 Let Gs B (n,p, k) be the semi-random model with an adversary that always 
elects not to place edges into the graph. Let {R,H) be a strictly balanced rooted graph on 
a constant number of vertices with v = nonroots(H) and e — edges(H). Then there exists 
e* > so that ifp = n- v/e+£ *, then A(H,Gf B (n,p,k)) = o(l). 

Proof: Theorem 8.3 follows immediately from Spencer's theorem (Theorem 8.2). Let 
Xi, . . .,X m denote the images of H in the clique K n and let A { be the event that X { C G. 
Each edge (x,y) is placed into G with probability at most p (either probability if x and y 
are in the same color class or else probability p if they are in different color classes). Thus, 
for any pair of events A,-, Aj, we have: 

Pr[A,- A A, | G <- Gt B {n,p, k)\ < Pr[A,- A Aj | G - G(n,p)]. 

For sake of completeness, however, we provide a direct proof here as well following the 
argument of Spencer [35]. 

We prove the theorem by considering separately for each fixed value of s G {1, . . ., v}, 
the pairs X,-,X ; - that share s vertices in common in addition to the roots. Note that if 
5 = 0, then the graphs X { and Xj share no edges and so are not counted in the summation 
in Definition 8.5. The number of pairs X,- and Xj sharing s vertices in common in addition 
to the roots is 0(n 2v ~') since there are 2v - s different vertices and only a constant number 
of permutations. 
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Let d > be a value such that every proper subgraph of H containing the roots has 
density at most dens(H) — e' (see the definition of "strictly balanced"). If s — v, then 
since Xi and Xj are distinct, there must be at least e + 1 edges in X, U Xj. For any fixed 
edge, the probability that edge belongs to G is at most p (it could be smaller, e.g. 0, if 
the two endpoints are in the same color class in the graph). So, the contribution to the 
summation from such pairs X, and Xj is at most: 0(n v p e+l ) = 0(rc v+ ( e+1 ^ _vy ' e+e )) = 
0( n -v/.+(«+i)««) _ Q (!^ for € * su ffi c iently smaU. 

Now consider s < v and fix a pair Xi and Xj sharing s vertices in common in addition 
to roots. Let S be the subgraph X { D Xj; that is, V(S) = V(Xi) n V(Xj) and E(S) = 
E(Xi) n E(Xj). Since (R,H) is strictly balanced, we know that \E(S)\/s < e/v - e' for 
some e' > 0. So, \E(X { ) U E{Xj)\ = 2e - \E(S)\ > 2e - se/v + ac 7 and thus the probability 
that both X t and Xj are subgraphs of G is at most p 2e ~ se / v + se . 

Finally, summing over all 0(n 2v ~ s ) pairs A",- and Xj sharing s vertices in common besides 
the endpoints, the contribution to A is at most: 

0(n 2v - s p 2e -' e/v+ ' e ') = 0(n 2v - , (rr v/e+£ ') 2e - ae/v+ "') 

= (5f n 2v - J - 2v + s - 5f ' v / e +( 2e - se / v + 3<r ') £ * ) 

= 0(n~ sc ' v/e+( - 2e ~' e/v+ ' e ' )c ') 

= o(l) (for e* sufficiently small). 

Thus, the contribution to A from each value of s £ [1, v] is o(l), and since there are only a 
constant number of different choices for s, we have A(/7, Gf B {n,p, k)) = o(l). ■ 

We will use this fact in the next section to prove that in balanced semi-random 3- 
colorable graphs, with high probability there will exist chains between every pair of vertices 
x and y in the same color class. 

8.3.3 The main theorem 

We now prove the following theorem. 

Theorem 8.4 Algorithm Chain will 3-color G <— GsB(n,p,3) with high probability for p > 
n -3/5+e j or an y cons t an i e > 0. 

The idea of the proof is to consider chains of some length r between two fixed endpoints 
(roots) x and y and to prove that with probability 1 — o(n~ 2 ) at least one such chain exists 
in G. This will be done by showing that chains are strictly balanced and then applying 
Theorem 8.3 and Janson's inequality. 

Before proving Theorem 8.4, however, let us first formally define a chain and prove a 
few preliminary lemmas. 
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«r-l 



ju;. 



w r 




v r -i 



Figure 8.3: A chain of length r between wq and w r . 

Definition 8.6 A chain C of length r is a rooted graph on 3r-fl vertices {w ,u ,v ,w 1 ,u 1 ,v 1 , 
. . ., w r -i,u r _x,v r _i, w r } and 5r edges, where: 

r-1 

£(<?) = U i( W » u i)>( w i> v i)> («i>t>0>( u <> W i+1 ),(Vi,W i+l )}. 
i = 

(See Figure 8.3). Vertices w and w r are the roots of the chain and will be called the 
endpoints. 



Definition 8.7 If G <— Q SB (n,p, 3), we will say that C is a potential chain between two 
vertices w and w r if all w t are in the same color class, and for each i, vertices Ui, v it and 
W{ are all in different color classes. 

Note that nonroots(C) = 3r - 1 and edges(C) = 5r, and there are no edges in C between 
the roots. Also, note that the ratio: —nonroots(C)/edges(C) = —3/5 + •&, so if p — 
n -3/5+e ag j n t jj e bcnmd for Theorem 8.4, then for C a sufficiently long chain we will have 
p = n -«onro ts(c)/ed g es(c)+c' for some e * > q. This is the form of the condition on p in 
Theorem 8.3 for proving that A = o(l). Our immediate goal is thus to prove that chains 
are strictly balanced. 

Fact 8.8 IfC is a chain of length r, then dens(C) = n ^2£(c) = 5 / 3 + 3nonrLs(C) - 

The following is a useful fact about subgraphs of chains. 

Lemma 8.5 Let S be a subgraph of a chain C. Then \E(S)\ < 5/3(\V(S)\ — 1). 

Proof: Let C have vertex set {w , u , v , . . ., tw r _ 1 , u r _i, w r -i> w r}- Let i,- be the ith 
hnk in C; that is, Z, = C|{„, iiUiii;iilUi+1 }. For convenience, partition the vertices of S into 
disjoint sets Vj- = V(S) D [V(Li) - {w i+1 }], and partition the edges of 5 into disjoint sets 
E t = E(S)nE(Li), for < i < r. So, S = ((JK-,!J£.)- 

For a given index i, if V { is not empty and w i+ i e V(S), then |£,|/|V;| < 5/3. One can 
easily check that the maximum value of this ratio occurs when £, and Vj- are both "full" 
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(sizes 5 and 3 respectively). If V { is non-empty but w i+1 g V(S), then: if |V|| = 3 we have 
\E { \ < 3, if \V t \ = 2 we have \E { \ < 1, and if |V;| = 1 we have |£,| = 0. 

Since there must exist some i such that Vj is non-empty and w i+ i tf. V(S), this implies: 

\E(S)\ < max{5/3(|V(S)|-3) + 3, 5/3 (|V(5)| - 2) + 1, b/3(\V(S)\ - 1) + } 
= 5/3(\V(S)\-l). . 

We can now use Lemma 8.5 to prove that chains are strictly-balanced, and so allow easy 
application of Janson's inequality. 

Lemma 8.6 Let S be a subgraph of a chain C of some constant length r such that V(S) 
contains the endpoints w and w r but V(S) does not contain all the vertices of C. Then, 
for some constant e' = e'(r) > 0, 

edges(S) < edges(C) _ , 



nonroots(S) nonroots(C) 
That is, chains are strictly-balanced. 

Proof: Since we are giving an upper bound on the number of edges in S, we may as well 
assume that 5 is a vertex-induced subgraph. 

First, suppose S consists of just one connected component. So, S contains vertices 
{w ,Wi,...,w r } and at least one of {«,-,«,-} for each i < r. Thus, for each vertex in C 
missing from S, there must be at least 3 edges of C missing from 5* as well: if u { £ V(S) 
then (wj, Ui), (u,-, «,-), (u t , w,+i) ^ E(S). So, if m vertices of C are missing from S, then: 

edges(S) edges(C) - 3m 



nonroots(S) nonroots(C) — m 

edges(C) . , 

< & K ' - d for some e' > 0, 

nonroots(G) 

because n ^^? c) < 3 and both edges(C) and nonroots(C) are constant. 

If S consists of more than one connected component, then w and w r cannot be in the 
same component since we have assumed that S is vertex-induced. We can thus partition S 
into two disjoint subgraphs: S, iart and S rest where S 3tart is the component containing w 
and S rest is everything else (and need not be connected). So, nonroots(S) = |V(5)| — 2 = 
\V(S start )\ + \V(S rest )\ - 2. Applying Lemma 8.5, we get: 

edges(S) = \E(S start )\ + \E(S rest )\ 

< 5/3(\V(S, tart )\ - 1) + 5/3(\V(S rest )\ - 1) 

= 5/3(|V(S)|-2) 

= (5/3)nonroots(5). 
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So ed S e °( s ) < "*«"<<;) - e' for c' = = 5 — 77 ^ , by Fact (8.8). ■ 

> nonroots(S) — nonroots(C) 3nonroots(C) ' ^ *■ ' 

Proof of Theorem 8.4: First, it is clear from the description of Algorithm Chain 
that adding additional edges into the graph G cannot decrease the probability of success. 
Therefore, in order to prove Theorem 8.4, it is enough to prove that Algorithm Chain 
succeeds when each potential edge is placed into the graph with probability exactly p; that 
is, the adversary A always chooses not to place edges into the graph. It is similarly also 
enough to prove the theorem when p exactly equals n _3 / 5+e for some constant e > 0. Let 

r e Z, e > be constants such that p = n~ 3/5+ ^~ M . (8.2) 

By Lemma 8.6, chains C r of length r are strictly balanced. So, by Theorem 8.3, there exists 
e* > such that if p < n -«o"roots(c)/edges(c)+ ( ' then A _ A(C r ,£^ B (n,p,3)) = o(l). Because 
additional edges cannot decrease the probability of success, we may for the purposes of 
analysis assume that: 

e < e*. (8.3) 

Thus, since - "Z^c? = = ^ 1 = ~ 3 / 5 + £' we have b ^ Theorem 8 - 3 that: 

A = o(l). (8.4) 

Fix two points x and y in the same color class in G; without loss of generality, say 
x,y G red. We now show that with probability 1 — o(n~ 2 ), x and y are connected by a 
chain of length r, with r as in equation (8.2). This will immediately imply Algorithm Chain 
succeeds. 

In fact, the analysis we provide to show that with high probability there is a chain 
between x and y, more generally holds in the random model Q(n,p) for any strictly balanced 
rooted graph H where p > n -"°"r°°t*(H)/ <*«'<»)+'• . This general fact is proven by Spencer 
[36][35]. The proof of Spencer's theorem can be seen to hold in the Q SB (n,p,k) model as 
well, so long as the number of images of H containing no edges between vertices in the same 
color class is 0(n nonroots ( H )). This is the case for chains, but is not the case, for example, for 
non- k- colorable graphs. For completeness, we provide a direct proof for chains along the 
lines of Spencer's arguments here. 

Label each potential chain of length r between x and y as some Ci for (1 < i < m), where 
the number of such potential chains is: m = 2 r (0(n)) 3r_1 (l - o(l)) since there are two color 
choices for each (u, v) pair, there are 0(n) vertices in each color class in GsB(n,p, 3), and r 
is constant. Thus, m — 0(n 3r_1 ). We bound the probability that x and y are not connected 
by a chain of length r by applying Janson's inequality. The universe U corresponds to the 
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set of 0(n 2 ) potential edges in G and the sets X, correspond to the potential chains C,- of 
length r between x and y, where X, is the set of all edges in C,. Every C, has 5r edges, 
each in G with probability p. So, 

m 

= (i-p 5r ) e( " 3r_i) 

_ -0(„- 3 '"+ 1 + 5rE + 3r - 1 ) 

= e - (" 5 ") 

= o(l/n 2 ). (8.5) 

Let us now consider the term e 1 ^" 5 " in Janson's inequality. For a fixed potential chain C;, 
the value A = Pr[C, C G] = p 5r = o(l). By our choice of e, we have A = o(l) as well 
(equation 8.4). Thus, e"^^ = 1 + o(l). 

We now apply Janson's inequality using the bound on the above term together with 
the bound on M in equation (8.5). We thus get that Pr[x and y are not connected by a 
chain of length r in G] = Me^> = M(l + o(l)), which equals o(l/n 2 ). So, with high 
probability, all pairs of vertices from the same color class are connected by some such chain 
and Algorithm Chain succeeds. ■ 



8.4 A better algorithm: general k 

We can extend the results of the previous section to graphs of higher chromatic number k. 
A simple way to do this is just to replace the notion of a "link" by that of a "Mink" defined 
as follows. 

Definition 8.9 A Mink for some constant t is a (t + 2)-vertex graph consisting of two 
vertices x and y called the endpoints both fully connected to a t-clique. (See Figure 8.4)- 

Equivalently, a Mink between x and y is a (t + 2)-clique with the edge (x, y) removed. 

Note that if two vertices in a fc-colorable graph are endpoints of a (k — l)-link, then they 
must be the same color in any legal fc-coloring. Using this fact, we can get the following 
natural generalization of Algorithm Chain to graphs of constant chromatic number k > 3. 

Algorithm f-Chain 

Given: G = (V, E), a k-colorable graph. 
Output: A k-coloring of G or else failure. 
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Figure 8.4: A 4-link between x and y. 



k = 


3 


4 


5 


6 


p value (fraction) 


„-3/5 


n- 4 / 9 


„-5/14 


n -6/20 


(decimal) 


n- 06 


Tl-0.444 


n -0.357 


n- 03 



Table 8.1: Algorithm <-Chain succeeds with high probability for p at least this value 
times n e . 

1. Create graph H = (V, F), where 

F = {(x, y) | 3 a (k — l)-link in G between x and y}. 

2. Find all connected components in H. If there are exactly k components, then 
halt with success, producing those components as the color classes ofG. 
Otherwise, if there are more than k components, then halt with failure. 

Theorem 8.7 Algorithm t-Chain k-colors G <- GsB{n,p,k) for p > re[(*+"o*-a] +E , (c > 0) 
with high probability. (See Table 8.1). 

In order to prove Theorem 8.7, we consider (A; — l)-chains of some constant length r. We 
then prove, analogously to the previous section, that (k — l)-chains are strictly-balanced, 
so Theorem 8.3 applies. We define a i-chain as follows. 

Definition 8.10 A f-chain of length r is a sequence of r t-links connected at their end- 
points. For a t-chain C with fixed endpoints x and y, we will treat the chain as a rooted 
graph, with x and y as the roots. 

Fact 8.11 If C is a t-chain of length r, then\V(C)\ = r(t+l) + l, nonroots(C) = r(*+l)-l, 
and \E(C)\ = edges(C) = r [(*+ 2 ) - l] = §[(*+ 1)(* + 2) -2]. So, jffij^ = ^ + 2-^] = 

1 + — 

2 ' «+l " 



Note that if C r is a (k — l)-chain of length r, then the term 
of Theorem 8.7 equals lim r _oo ~°°° "^ . r ^ . 



( t+iu-2 m ^ e statement 
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As in the proof of Theorem 8.4, the first step is to prove that /-chains are strictly- 
balanced. We first prove a fact analogous to Lemma 8.5. 

Lemma 8.8 Let S be a subgraph of a t-chain C of length r. Then: 

\E{S)\<[^-^\{\V{S)\-l). 

Proof: 

Let L be a Mink and let g(t) = \ + j^. Define dens^H) = $$-1 ' so: 

den Sl (L) = den Sl (C) = g(t). (8.6) 

Claim 1: If S C L, then dens^S) < g(t). 

Proof of 1: We may assume S is vertex-induced. Thus, S is either a (t + 2 — c)- 
clique or else 5 is a (t - c)-link for some c > 1. In the latter case, the claim follows 
from equation (8.6) since g(t) is an increasing function of t. In the former case, we have 
den Sl (S) = i±|=£ = | + 1- f < g(t) for c > 1. □ 

Now, we prove the lemma that if S C C then dens^S) < g(t) by induction on the 
length of C. The base case is proved in the above claim, so we may assume the lemma 
holds for any /-chain of length r — 1. Let C be the i-chain of length r - 1 consisting of 
the first r — 1 links in C and let S' be S restricted to C . So, dens^S') < </(<). Let I 
be the last link in C and let S L be S restricted to L. So, \E(S)\ = \E(S')\ + |-E(Si)| and 
\V(S)\ > |F(5')| + |V"(5l)| - 1 (note: S' and S L may share one vertex in common where L 

ioins C'\ Thus dens, ( 1) < ms')\+\E(s L )\ _ \e(s')\+\e(s l )\ < (f) . induction and 

joins o ). inus, aens^a ) ^ |v(s')|+|v(St)|-2 ~ [|v(S')|-i]+[|v(s L )|-i] - aw "^ " luut - uuu <i " u 

Claim 1. ■ 

Lemma 8.9 Lei S be a subgraph of a t-chain C of some constant length r such that V(S) 
contains the endpoints x and y but V(S) ^ V(C). Then, for some constant e' = e'(r) > 0, 

edges(S) edges(C) _ , 



nonroots(S) nonroots(C) 
That is, t-chains are strictly- balanced. 

Proof: For Cat-chain of length r, we have edges(C) = §[(i + l)(* + 2)-2] = §[t 2 + 3i]. 
So, we may upper bound the density as follows: 

dens(C) = ig0 

_ t 3 +3l 



2t + 2-2/r 

t+3 
— 2 



< <+3 
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Again, we may assume that S is a vertex-induced subgraph. 

First, suppose S consists of just one connected component. So, S contains at least one 
non-endpoint for each i-link L in C, and contains all link endpoints. Let us focus on some 
fixed Mink L in C. If S is missing m vertices from that link, then it must be missing at 
least (t + 1) + t + (t - 1) + . . . + (t - m + 2) edges from that link as well. So, the ratio 
of (edges missing) to (vertices missing) is at least ( t + 1 ?+(«- m + 2 ) > ^, for m < t - 1, the 
largest value of m possible. Thus, if there are in total m vertices in C missing from 5, then 

edges(S) edges(C) - m(t + 4)/2 



nonroots(S) nonroots(C) — m 

edges(C) . , 

< 5 v ' - e' for some e' > 0, 

nonroots(C ) 

because ^ > \ + dens(C). 

If S consists of more than one connected component, then the endpoints of C cannot 
be in the same component since we have assumed that 5* is vertex-induced. We can thus 
partition S into two disjoint subgraphs: S atart and S rest where S, tart contains root x and 
S rest is everything else. So, nonroots(S) = \V(S start )\ + |V(5 re ,«)l ~ 2 - Let 5(0 = 1 + J^J- 
Applying Lemma 8.8, we get: 

edges(S) = \E(S start )\ + \E(S rest )\ 

< g(t)(\V(S. tart )\ - 1) + g(t)(\V(S„.t)\ - 1) 
= g(t)nonroots(S). 

O edges(S) < \E{C)\ _ ed S es(C) < ed f es(C) _ , f , q 

° ' nonroots(S) — |V(C)|-1 — nonroots(C)+l — nonroots(C) ^ w- 

Proof of Theorem 8.7: As in the proof of Theorem 8.4, we may assume that the 
adversary A always elects not to place edges into the graph. Let C = C(r) be a (k — l)-chain 
of length r between two fixed vertices x and y. So: 

-nonroots(C)/edges(C) = -(fcr - 1)/ (§[Jfe(Jfe + 1) - 2]) (see Fact 8.11) 

-2k 2 

+ 



ib(fc + l)-2 r[fc(Jfc + l)-2]' 
Thus, for sufficiently large r, for some e > 0, we have p > n -"onroots(C)/ed g es(c)+z_ By 
Lemma 8.9 we know C is strictly balanced, so let e* > be the constant of Theorem 8.3 
such that for p < n ->>°»">ots(c)/edges(c)+ c ^ we have A _ A(C,Q£ B (n,p,k)) = o(l). Because 
additional edges cannot decrease the probability of success, we may for the purposes of 
analysis assume that e < e*. That is, 

p = n -nonroots(C)/ edges(C)+i fo r SO me r G Z, < € < €* . (8.7) 
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We now examine all potential (k - l)-chains C, of length r between at x and y. That is, 
all images over {x, y} in K„ of C, such that the image contains no edges between two vertices 
in the same color class in the adversary's A:-coloring. Since in the balanced semi-random 
model there are 0(n) vertices in each color class and since r is constant, the number of 
potential (k - l)-chains is m — Q( n "°"™ oi *( c )y Because each edge in some such C< is placed 
into G with probability p, for any given C, we have 

p r \C- c G] = n edges( - c ) = n -n° nr °° ts ( c )+ ed & es ( c ) 1 _ 



So: 



M = ]JPr[CaG} 



t=i 



_ n ( — nonroots(C)+edges(C)f)g^ rJ «oiu'oi)i«(c)-j 
_ e ^ @{n edges(C)i ) 

= o(n-*). 

Since A = Pr[Ci C G] = p ed s e < c ) = o(l) and A = o(l), we have by Janson's inequality 
that: Pr[x and y are not connected by a (k - l)-chain of length r in G] = Me Trx " 3 ' = 
M{\ + o(l)) = o(l/n 2 ). So, with high probability, all pairs of vertices from the same color 
class are connected by some such chain and Algorithm 2-Chain succeeds. ■ 

8.5 Relating the balanced and unbalanced models 

For graphs of chromatic number 3, we had fairly good performance bounds even in the 
unbalanced model. However, for graphs of higher chromatic number, the algorithms required 
the number of vertices in each color class to be roughly balanced. The reason that the 
unbalanced case is harder is that if a color class is very small, then the noise rate p as a 
function of the number of vertices is dramatically lower. So, if (k — 1) color classes each are 
small, the algorithm is essentially required to solve a problem for a much lower noise rate 
on the (k — l)-chromatic graph defined by those colors. In particular, one gets the following 
theorem. 

Theorem 8.10 If BPP ~£ NP, then for k > 4 there is no polynomial-time algorithm for 
k-coloring graphs in Q s (n,p,k) with high probability, for p = n~ € for any constant e > 0. 

Proof: Suppose otherwise; that is, there exists an algorithm B for ^-coloring graphs 
in Q s (n,p,k) for p = n~ e for some constant e > where k > 4. We show how to use B to 
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optimally color an arbitrary (k - l)-colorable graph in probabilistic polynomial time. Note 
that for k > 4, the problem of optimally coloring (k - l)-colorable graphs is NP-hard. 

Let G = (V,E) be a (k - l)-colorable graph on n-vertices. We create a new TV-vertex 
graph H = (V UV',F) where V is a set of vertices of size n 3/e disjoint from V, and 
N = n + n 3/c as follows. For each pair u, v € V, if (u, v) € E then let (u, v) be an edge in 
F as well. Also, independently for each pair v 6 V and v' G V, let (u, v') be an edge of 
F with probability 1 - p for p = 7V~ £ . Note that there are no edges in F between vertices 
in V. Now, feed graph H to algorithm B. If B fc-colors H , then with high probability it 
must assign at most k — 1 colors to V and therefore (k — l)-color G. The reason is that 
otherwise there are k vertices in V all given different colors by B, and with probability 
(1 - p) k > 1 - kp = 1 - o(l), any given vertex in V is connected to all k of them (and 
in fact with extremely high probability, there will be some such vertex in V"). This forces 
(k + 1) colors to be used in H . 

The main point of the proof is that an adversary with noise rate p = N~ l can create a 
fc-colorable graph on TV* vertices in a distribution indistinguishable from that used to create 
H . In particular, as above, the adversary separates the N vertices into one set V of n 
vertices and k — 1 colors, and one set V of N — n vertices and one color. It then attempts 
to places edges between vertices in V exactly where they appear in the graph G and to 
put in all edges between V and V. Since n is so small (less that TV'/ 3 ), there are at most 
A~ 2 ' /3 potential edges in the set V. So for p — N~% with probability at least 1 - N~ t/3 the 
adversary will be able to place exactly the edges it wishes between vertices in V without 
any noise at all. Thus, since we assumed that B can &-color graphs created by such an 
adversary with high probability, then B must fc-color graph H with high probability as well. 



In the balanced model, our best bound for 3-coloring G <— GsB(n,p,3) with high prob- 
ability is p > n~ oe+e . For the random model, we were able to 3-color for p as low as 
n -i+f \Ye leave as an open problem whether one can achieve such a low bound on p for 
the semi-random model as well. 



Chapter 9 

Lower bounds for independent set approximation 
based on approximate graph coloring 



In this section, we describe a lower bound for independent set approximation (or equiv- 
alently clique approximation) based on assumptions about the hardness of approximate 
graph coloring. Thought of in contrapositive form, we show how to get very good bounds 
for approximate graph coloring if we are given seemingly weak algorithms for approximating 
the maximum independent set in a graph. These results are corollaries to a basic technique 
of Berman and Schnitger [7] which they use to provide weaker lower bounds for independent 
set approximation based on other hard problems. 

Let is(G) denote the size of the largest independent set in graph G. For the Independent 
Set problem, we define the performance guarantee of an algorithm to be the worst-case ratio 
over all graphs G on n vertices, of is(G) to the size of the independent set found (with high 
probability if the algorithm is randomized) [12]. So, the lower the performance guarantee, 
the better the algorithm. The best performance guarantee known for a polynomial-time 
algorithm for Independent Set is O(n/(logn) 2 ) by Boppana and Halldorsson [12]. 

What we show in this chapter is that if there exists an polynomial-time algorithm 
with performance guarantee 0{n 1 ~ t ) for Independent Set, then there is a polynomial-time 
algorithm to color A;-colorable graphs with O(logn) colors and to color (log 7i)-colorable 
graphs with polylog(ra)-colors. The best algorithm known to date [22] for coloring (log n)- 
colorable graphs uses more than n/(log n) 2 colors. The best algorithm known for 3-colorable 
graphs (see Chapters 4 and 5 of this thesis) uses 0(n 3 / 8 ) colors. So, this result shows that 
a performance that seems only somewhat better in approximating independent sets implies 
being able to do quite significantly better for approximate graph coloring. 

9.1 Additional definitions and previous results 

Given a maximization (minimization) problem, we say an algorithm is a polynomial-time 
approximation scheme (PTAS) if for any constant e > 0, it runs in probabilistic polynomial 
time and finds a solution of value within a (1 + e) factor of the maximum (minimum). For 
example, consider the problem MAX 2-SAT of finding a solution that maximizes the number 
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Where 


Lower Bound 


Assumptions 


FGLSS 


constant 


NP 1 n°( |oglog ")-TIME 


FGLSS 


2(logn) 1 - e 


NP £ n( lo s") k -TIME 


BS 


n c 


/9 PTAS for MAX-SNP 


Here 


n 1 - f 


No (log rc)-coloring algorithm for fc-colorable graphs or 
no polylog(n)-coloring for (log n)-colorable graphs. 


Here 


n/2 (elog„) l/2 


No 0(n e )-coloring for k colorable graphs in 
quasi-polynomial (n^ logn ^ ) time. 



Table 9.1: Lower bounds for Independent Set approximation based on various assump- 
tions. 



of satisfied clauses in a 2-CNF expression. A PTAS for this problem would be an algorithm 
that for any e > 0, given a sufficiently large 2-CNF expression, finds an assignment that 
satisfies 1/(1 + e) of the maximum number of clauses possible. 

MAX SNP is a syntactically defined class of problems described by Papadimitriou and 
Yannakakis [30]. It has the property that if one MAX SNP-hard or MAX SNP-complete 
problem has a polynomial-time approximation scheme then all problems in MAX SNP do 
as well. Some MAX SNP-complete problems include MAX fc-SAT for k > 2 (finding the 
maximum number of clauses satisfiable in a fc-CNF formula), the problem Independent Set- 
B of finding the largest independent set in a graph of constant degree bound B > 4, the 
TSP with edge weights 1 and 2, and others [30]. It is believed for these problems that no 
polynomial- time approximation schemes exist. 

Berman and Schnitger prove that if there do not exist polynomial-time approximation 
schemes for MAX SNP-hard problems, then for some constant e > 0, no polynomial-time 
algorithm approximates Independent Set with performance guarantee n l . In a recent result 
of a very different style, Feige, Goldwasser, Lovasz, Safra, and Szegedy [19] prove a lower 
bound for approximating independent sets based on NP not containing "quasi-polynomial 
time". In particular, they show that there is no polynomial-time algorithm for independent 
set with performance guarantee 0(2 ( - lo «"^"' ) for any e' > 0, if NP % \J k [n^^-TIME] . 
In addition, they show there is no algorithm with constant performance guarantee for In- 
dependent Set if NP £ n°( loglog ")-TTME. Thus, they get a weaker conclusion than Berman 
and Schnitger (since 2^ log "^~ e < n l for all e,e' > 0) but based on likely a much more solid 
assumption. The new results presented in this chapter go in the other direction, proving 
a much stronger conclusion, but based on what may be much less solid assumptions. The 
results are summarized in Table 9.1. 
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9.1.1 The basic idea of the new results 

Berman and Schnitger prove their result on approximating independent sets by amplifying 
"approximation gaps" in a constraint satisfaction problem MAX kn . They then show this 
problem can be reduced in an approximation-preserving sense to Independent Set, and 
reduced from, in such a sense, MAX SNP-Complete problem MAX 2SAT (Lemma 4.6 of 
[7]). Following the chain of reductions yields their n £ bound. More simply, however, we 
can apply their basic technique in a straightforward way directly to the independent set 
problem. Doing so allows us to relate the approximability of Independent Set not only to 
that of finding PTAS's for MAX SNP-hard problems (in this case, the problem Independent 
Set-B), but also to the problem of finding good approximations for graph coloring. In fact, 
this version of their procedure (in some ways more general, in some more specific than the 
procedure in [7]) can be thought of as a randomized version of a commonly used graph 
product, and we describe the procedure from this point of view. 

9.2 Randomized graph products 

We now describe the randomized graph product technique that will be the key to the results 
presented in this chapter. The technique is formalized in the procedure Rand-Select below. 
Algorithm Rand-Select takes as input an n- vertex graph G and values r,p, and t, and 
produces as output a new JV-vertex graph H . The purpose of this procedure is to amplify 
gaps in independent set approximation. In particular, the procedure will reduce a problem 
of finding an independent set of size n/t p in an n-vertex graph containing an (unknown) 
independent set of size n/t, to a problem of finding an independent set of size N/(n r ) p in 
an N-vertex graph containing an independent set of size N/n r , where N = n rp+2 . Thus, 
for example, if the original graph was 3-colorable and so contained an independent set of 
size ra/3, then the problem of finding an independent set of size n/9 in the original graph (a 
factor of 3 smaller) is mapped to a problem of finding an independent set a factor of l/n r 
smaller than the largest independent set in the new n 2r+2 -vertex graph. We now describe 
the procedure. 

Algorithm Rand-Select (Variant of procedure in proof of Lemma 4.3 in [7]) 

Given: An n-vertex graph G — (V, E) and values r, p, and t. 

Output: An n rp+2 -vertex graph H, and a mapping ip from subsets of G to vertices 
ofH. 

1. Select N = n rp+2 subsets of vertices, each of size rlog t n, at random from the 
vertices of G. Label the subsets 5 1? s 2 , . . . , s^. 
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G 



H 





Figure 9.1: A sample mapping from sets s, in G to vertices Wj in H. 

2. For each subset s { , associate a vertex u>,- in H . The edge set E(H) = {(wi, Wj) | 
Si U Sj is not independent in G}. That is: (wi,Wj) is not an edge in H only if 
both Si and Sj are independent sets and in addition there are no edges between 
any vertex in s, and any vertex in Sj. 
Define a mapping f(si) = Wi and ip~ l (u)i) = s t . (See Figure 9.1.) 

(Note that for this to be a polynomial-time procedure, we need the product rp bounded 
above by a constant. The value t need not be a constant: in fact, we will later plug in 
t = log n to apply this technique to (log n)- colorable graphs.) 

Given a graph G and new graph H created using Rand-Select above, it will be convenient 
to extend the mapping <p as follows. For S C V(G), let <p(S) = {<p(si) | Si C S}. Also, for T 
a subset of V(H), define y~ l {T) = {v | v G s { for some ip(si) € T} = LLetV -1 ^)- Notice 
that 5" 2 i P~ 1 ( i f(S)) and T C y>(<£> -1 (T)); we do not necessarily get equality in the first case 
since S may have elements not inside any s,- C S, and in the second case, for Wi,Wj £ T, 
the set Si U Sj may contain some s k for w k $ T. From this extended definition of <p and the 
definition of E(H) in step 2 of Rand-Select, we immediately get the following fact. 

Fact 9.1 If S is an independent set in G, then <p(S) is an independent set in H. If T is 
an independent set in H of size at least 2, then (p~ l (T) is independent in G. 

Proof: If Wi,Wj € <p{S), then s,-,s ; - C S. So if the edge (u\-,u>j) is in H, then s; U Sj is a 
non-independent subset of S. If T is independent in H of size greater than 1, then for each 
Wi G T, the set s t must be independent in G. So, tp~ l (T) is a union of independent sets 
that are pairwise independent of each other, and thus is independent itself. ■ 

The purpose of procedure Rand-Select is as follows. Let H — Rand-Select(G, r,p, i). If 
we have an independent set S of size njt in graph G, then since each s, has probability 
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about (|) rl °6«" of being chosen inside S, the expected size of <p(S) in H is 0(n rp + 2 (j) rlog < ") 
= 0(ra r(p_1)+2 ). In fact, with high probability, <p(S) will be about that large. However, the 
expected size of <p(S') for S' an independent set of size n/t p is only 0(n rp+2 (^) rlog '") = 
0(n 2 ). In fact, it turns out that with high probability, <p(S') will be small for all such S' 
(this is the purpose of the "+2" in Rand-Select) as described in the following theorem. * 

Theorem 9.1 Let G be an n-vertex graph with an independent set S of size n/t and let H 
be the output of Rand-Select(G, r,p,t). Then, with high probability, if(r\og t n) 2 = o(n/t p ) 
where p > 1, both of the following are true: 

(1) | V (5)| > \n r <*-W, and 

(2) for every independent set S' in G of size at most n/t p , we have \<p(S")\ < An 2 . 
Note that (1) implies \<p(S)\ = Q(N/n r ) and (2) implies \<p{S')\ = 0{N/n rp ), for N = 
\V{H)\. 

The proof of this theorem uses the following standard (Chernoff variant) probabilistic in- 
equality (e.g., see [1]). 

Fact 9.2 Suppose X\,. . .,X m are mutually independent {0, l}-valued random variables. 
Let X = X 1 + X 2 + ■ ■ ■ + X m and let fi = E[X}. Then: 



Pr[X>2/i] < (I)" 
Pr[X <fi/2] < e-" /8 . 



Proof of Theorem 9.1: First, claim (1) (the easy half). Given S C V(G) of size n/t, 
consider a run of algorithm Rand-Select. Let X; be a random variable such that X, = 1 if 
Si C S, and otherwise X { = 0. So, X = ^X, equals |y(5')|. Since S has size n/t, we have: 

L ' J Vrlog t n'/ Vr log, n' 

= (i) rl ° g '"(l + o(l)) ( since (r log, nf = o{n/t)) 
= n- r (l + o(l)). 

Thus, E[X] = n r ( p - 1 ) +2 (l + o(l)) and with high probability, we have X = \<p(S)\ > 

I n r(p-l)+2 # 

Now, claim (2): we show that for every small set S' (and thus every small independent 
set S'), ip(S') has size at most An 2 . We do this by showing that each individual set S' has 
an extremely low probability of having an image under <p larger than this value. 



1 We can actually replace the "+2" with "+1" to get a slightly better bound in Theorem 9.2 with only a 
little extra effort. However, the precise value of the constant is not crucial for us unless it could somehow 
be made significantly less than 1. 
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Fix a given set 5" in G of size n/t". Let X[ = 1 if «,- C S', and X\ = if s t % S', and 
let X' — Yl^l- Again, since (rlog t n) 2 = o(n/t"), we have: 

Pr[X,'=l] = (<')- rlo * n (l + o(l)) 
= n- rp (l + o(l)). 

Thus, 0.5n 2 < E[X'} < 2n 2 . Applying Fact 9.2, we get that Pr[X' > An 2 ] < (e/A) 05n \ 
which equals 1~ cn for some constant c. Note that if S' has size less than n/t p , then 
Pr[X' > An 2 ] can only be lower. Now the crucial point: the probability that X' > An 2 is so 
small that even if we now sum over all at most 2" such sets S' , we get that Pr[|<£>(5")| > An 2 ] 
for any S' of size at most n/t p , is no more than 2 n 2 -cr,:i = o(l). Thus, with high probability, 
both conclusions of Theorem 9.1 hold. ■ 

So, algorithm Rand-Select maps a problem of finding an independent set of size l/i p_1 
times the largest in the original n- vertex graph to a problem of finding one of size l/n rp ~ r = 
l/iV' 1- *^ 3 ) times the largest in the new JV-vertex graph. In particular, one gets the 
following theorem. 

Theorem 9.2 Suppose there exists a (randomized) algorithm A for Independent Set on 
N -vertex graphs that runs in time f(N) and has performance guarantee < |iVv ~ ?*+■*) for 
constants r,p. Then, there is a randomized algorithm B that on n-vertex graphs containing 
an independent set of size n/t, finds an independent set of size n/t p with high probability in 
time f(n pr+2 ) + 0(n rp+o ^), so long as (rlog, n) 2 = o(n/t p ). 

Proof: Given an n-vertex graph G with independent set S of size n/t, run algorithm 

Rand-Select (G,r,p,t) to create graph H on n rp+2 vertices. This step takes 0(n rp+o ^) 

time. By Fact 9.1, we know that <f(S) is independent in H, so by Theorem 9.1 claim (1), 

we have that with high probability H contains an independent set of size |n rp-r+2 . So, 

algorithm A in time f(n rp+2 ) finds an independent set T in H of size at least: 

I n r P -r+2 4n rp - r+2 
l - = An. 



Now, look at S' = <p~ l (T) which is independent in G by Fact 9.1. We know, by definition 
of v?, that <p(S') 2 T and so |v?(S")| > An 2 . Thus by Theorem 9.1 claim (2), we have: with 
high probability S' must have size at least n/t p . u 

If we plug p — 1 + f into Theorem 9.2 and let r = 2(1 — 6)/8 so -^ = 1 — 6, then: 
So, if we view Theorem 9.2 in the contrapositive form, we get the following corollary. 
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Corollary 9.3 If for some t and some constant e > there is no randomized polynomial- 
time algorithm which finds an independent set of size n/t^ 1+t ^ on n-vertex graphs containing 
an independent set of size n/t, then: for any constant 8 > there is no (randomized) 
polynomial-time algorithm with performance guarantee o(n^^~ 6 ^) for general Independent 
Set. 

This corollary immediately implies the Berman-Schnitger result on the approximability 
of Independent Set [7] by using the MAX SNP-hard problem Independent Set-B. Any n- 
vertex graph with degree at most B must have an independent set of size n/(B + 1). So, 
we get the following. 

Corollary 9.4 (Berman and Schnitger) If there do not exist randomized PTAS's for 
MAX SNP-hard problems, then there exists c > such that Independent Set does not have 
a (randomized) polynomial-time approximation algorithm with performance guarantee n c . 

In particular, if for some e and B, Independent Set-i? does not have a randomized polynomial- 
time approximation algorithm with performance guarantee (1 + e), then Independent Set 
does not have a polynomial time approximation algorithm with performance guarantee 
o(n' r +J T ^ 1 " ') for e' = log B+1 (l + e), for any constant S > 0. 

We can also use Theorem 9.2 to provide stronger bounds on independent-set approxi- 
mation based on assumptions of the hardness of approximate graph coloring. In particular, 
we can prove the following. 

Theorem 9.5 Suppose there exists a (randomized) polynomial-time algorithm A for In- 
dependent Set with performance guarantee n 1_f for some e > 0. Then, there is a ran- 
domized polynomial-time algorithm B that will color any n-vertex k-colorable graph with 
O(log n) colors, and color any n-vertex O(log n)-colorable graph with O(log c n)-colors (where 
c < 1 + 3/e). 

Theorem 9.6 Suppose there exists a (randomized) quasipolynomial-time algorithm A for 
Independent Set with performance guarantee N/2y cl ° sN on N-vertex graphs, then there is 
a randomized quasipolynomial-time algorithm B to color any n-vertex k-colorable graph with 
0(n ( ) colors, where e = (201ogfc)/c. 

Proof of Theorems 9.5 and 9.6: Given an n-vertex ^-colorable graph G, we know there 
exists an independent set of size at least n/t. Suppose we had an algorithm B' that on any 
n-vertex graph with an independent set of size n/t were guaranteed to find an independent 
set of size n/t p for some constant p. We could then find a coloring of G with at most (t p In n) 
colors by applying B' ', coloring the independent set found with one color, and then repeating 



9.2. Randomized graph products 93 

on the remaining graph G' of size at most n(l - l/t p ). Note that since G is i-colorable, 
so is graph G', and thus G' has an independent set of at least l/t of its vertices as well 
and we may reapply B'. The number of colors used by this algorithm B is at most a value 
C such that n(l - l/t p ) c = 1, so C < -(lnn)/ln(l - \jt p ) < t"\nn. Thus if t is some 
constant k, the number of colors used is 0(log n) and if t = log n, the number of colors used 
is 0((logn) p+1 ). (The fact that logn decreases as the graph gets smaller only helps). 

If there exists a polynomial time algorithm A for Independent Set with performance 
guarantee n 1_£ for some constant e > 0, then for p > ^^, algorithm A has performance 
guarantee ©(ra 1- ^ 5 ). So, we can apply Theorem 9.2 with r = 1 to get a randomized 
polynomial-time algorithm B' with the guarantee we need. This proves Theorem 9.5. 

For Theorem 9.6, we must be a bit careful. 2 The quasi-polynomial time algorithm B 
is as follows. Given n- vertex fc-colorable graph G, let p = elog^n so n £ = k p , and let 
JV = n p+2 . Plugging in these values, we get: 



2i/clogiV _ jy\/c/v / (P+ 2 ) lo «" 

> jy\/<7(2plogn) 

_ Jfy/tc/pp'logk) ( uging logn _ (pl gfc)/ e ) 

= N^ (U8ing6=2°i2&*) 

> N^. 

Thus, the performance guarantee N/2v cl ° sN of algorithm A for Independent Set on N- 
vertex graphs is o(N/N^) = o(A rl_ ^F 3 ). So, we can again apply Theorem 9.2 with r = 1 
to get an algorithm B guaranteed on any A:-colorable graph to find an independent set of 
size n/k p = n l ~ c . Thus, B makes progress (in fact, progress type 1 of Section 3.3) towards 
an O(n e )-coloring of G. 

Since algorithm A is quasipolynomial, algorithm B runs in time quasipolynomial in 
(n p+2 ), which is quasipolynomial in n since n p+2 = /i°' logn '. ■ 



2 Note: it is easy to fall into a trap in Theorem 9.2 in falsely thinking that if p is a function of n (eg. 
p = flog n) for algorithm B, then we can plug in the same function of N (eg. flog JV) for algorithm A. 



Chapter 10 

Possibilities for improvement, open problems, and 
conclusion 



10.1 Possibilities for improvement 

Algorithm First-Approx performs most poorly when (1) many vertices share about n 02 neigh- 
bors in common, and (2) the average vertex degree is about n° 4 . If the edges in the graph 
were distributed randomly, this combination of events would likely not occur since for such 
a low average degree, any two given vertices would be expected to share less than one 
neighbor in common. Instead, the graph must contain high density regions. For example, 
a graph could have properties (1) and (2) above if it consists of a collection of "clusters" 
of size 0(n° 6 ) such that each vertex inside a given cluster has Q(n° 4 ) neighbors within 
the cluster and Q(n 0A ) neighbors distributed throughout the other clusters. Thus, if the 
edges within a cluster a distributed randomly, then 2 vertices inside the same cluster share 
on average Q((n° 4 ) 2 /n° 6 ) = Q(n° 2 ) neighbors in common, even though the degrees are 
low. (The purpose of giving to each vertex Q(n° 4 ) neighbors in the other clusters is so 
that the distance-2 neighbor set N(N(v)) for each vertex v may have size fi(n° 8 ) to avoid 
immediately making progress through Corollary 3.2.) 

Algorithm I m proved- Approx achieves better performance by taking advantage of such 
high density regions when they are found. However, one other possible approach is the 
following. Suppose by removing 9/10 of the edges in the graph, one could somehow get 
rid of such high-density regions and prove a stronger analog of Theorem 4.1 (bounding the 
number of shared neighbors of two vertices). Then, Theorems 4.5 and 4.6 would still apply 
to show that some set T = Ni(N(v) D Ij) in the new graph is both large and has a large 
fraction of its vertices red. The main point here is that even though an independent set in 
the new graph might not be an independent set in the original graph, there still must be 
some color class in a 3-coloring of the original graph that satisfies the A = 1/2 condition (see 
Theorem 4.5) in the new graph. Also, the average degree has only changed by a constant 
factor, so the set T produced will still be large. One small difficulty is that Corollary 4.7 
relies on a large minimum degree which might no longer exist in the new graph. This 
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problem can be overcome by simply deleting all vertices with degree less than, say, 1/10 of 
the average in the new graph. 

A different way one might be able to do significantly better is to consider distance-3 
neighborhoods of vertices (or perhaps even distance-^ neighborhoods for larger i). From 
preliminary calculations, I believe that some of the results for distance-2 neighborhoods 
may go through — for example, that one could find a set T with an independent set of 3/8 
of its vertices inside the distance-3 neighborhood. (Note that if the edges were distributed 
randomly, one would expect a ratio of §:|:§ of blues to reds to greens inside the distance-3 
neighbors of v for v G red.) However, all the techniques given here for forcing expansion 
— that is, for forcing the set found to be large — seem to break down completely. 

10.2 Open problems and conclusion 

We have described here an algorithm guaranteed to color any 3-chromatic graph with 
0(n 3 / 8 ) colors in the worst case, and shown how these techniques can be used to improve 
previous bounds for coloring fc-chromatic graphs for k > 3 as well. Clearly, however, there 
remains a long way to go. There is no reason to believe an 0(n 3 ^ s ) bound is intrinsic to the 
coloring problem. In fact, for coloring 3-colorable graphs, to date there is no lower bound 
known greater than 3. That is, it remains unknown whether there is any intrinsic reason 
why one could not 4-color any given 3-colorable graph in polynomial time. It would be a 
very significant contribution to this area if one could make headway in this direction. For 
the general problem of coloring graphs of arbitrary chromatic number, the best lower bound 
remains a factor of 2 — e from 1976 by Garey and Johnson [20]. 

The random and "semi-random" case appears much easier. We have described here an 
algorithm to color a random fc-colorable graph in the model Q(n,p, k) for p as low as n~ 1+( 
(see Section 7). For even smaller values of p, perhaps some other strategy might work well. 
An intriguing open question is whether there might be a polynomial-time algorithm to color 
graphs in Q(n,p, k) for every p, or whether there is some intrinsic reason such an algorithm 
should not exist. Experimental work of Petford and Welsh [31] suggests that at least for 
the heuristics used there, low values of p for which the average degree in the graph is about 
5 or 6 may be the hardest. 

For the semi-random model we described in Chapter 8 an algorithm to color graphs in 
Q SB (n,p,3) for p as low as n~° • 6+t , and for higher values of p for k > 3 (see Table 8.1). 
One obvious open question is whether one can optimally color such graphs for lower noise 
rates p. A second open direction to explore is coloring graphs based on even "harder" semi- 
random sources that have been proposed and studied in the cryptographic literature. In 
these models, the "noise" is not independent over each bit; rather, we are simply guaranteed 
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that no sequence of bits of some length occurs too often. In a graph setting, this might 
correspond to a model in which we are simply guaranteed that for any given collection of 
"potential edges," no fixed configuration occurs with more than some specified probability. 
The reader is referred to papers of Chor and Goldreich [17] and Zuckerman [44] for more 
details on these "weak random" models. 



Appendix A 

The Vertex-Cover / Independent-Set 
approximation algorithm 



We now describe a simplified version of the Vertex-Cover approximation algorithm of Bar- 
Yehuda and Even [4] and Monien and Speckenmeyer [28], specialized to its use in this thesis. 
The version here is taken from a treatment given by Boppana and Halldorsson [12]. We 
will describe the algorithm as an Independent Set approximation algorithm for the special 
case where the input n- vertex graph contains an independent set of at least |(1 — j^-^) of 
its vertices. The output of the procedure is an independent set of size il(n/ log n). 

Algorithm Approx-IS [Simplified version of the BE/ MS algorithm] 

Given: An n-vertex graph G which has an independent set of size at least |(1 — 
,-Mn. 

Output: An independent set of size at least D(n/logn). 

1. Remove all odd cycles of length < 11 + 1 for I = ^p - \. See Note 1 below. 
(Assume for simplicity that ^Jp — | is an integer.) 

2. Initialize I, the independent set found, to (f). 

3. Choose v € V. 

4. For i G {0, ...,/}, let Vj = the set of vertices of distance i from v. 

5. For i e {0, . . . , /}, let S { = V { U F<_ 2 U V;_ 4 U . . .. 

Note that Si is an independent set since there are no odd cycles of length < 21 -\- 1. 
For example, if there were an edge between a vertex in V 2 and a vertex in V 4 
then there is a cycle of length 7. 

Also, note that N(Si) = 5,- + i. 

6. Let i < I bean index such that \N(S io )\ < n 1/( ' +1) |5,- |. 

This property must hold for some i € {0, ...,/} because otherwise: 

\N(S,)\ > nW+VlStl > n 2 /(' +1 )|5,_ 1 | > n 3 /(' +1 )|5,_ 2 | > . . . > nV + W + ^\S \ - n, 

a contradiction. 
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7. Let I «- I U S ia and let V <- V - S io - N(S io ). 

IfV is non-empty, then go back to Step 3. Otherwise output set I. 
See note 2 below. 

Note 1: Step 1 removes all odd cycles of length < 2/ + 1. An odd cycle of length 2i + 1 
may have at most i vertices in any independent set in G. So, if m vertices remain after 
this step (so n — m are removed), we have removed at most ^-y (n — m) vertices from any 
independent set in G. Thus, the maximum independent set in G may have size at most 
m + jj+i ( n — m). This implies that the number of vertices m remaining is at least nj log n 
since otherwise, 

m + (n-m)^ < m + (n - m)(^ - I)/(Jsp) 

< -n-+( n 2- VI 2_1 

— logo T V* logn/V 2 21ogn-' 

n n I 3n 

~ 2 logn ' 21og 2 n 

< |(1 — k~)ra. (for n sufficiently large) 
This contradicts our assumption on the largest independent set in G. 

Note 2: By Note 1, after Step 1 we know graph G has at least n/logn vertices. Each 
application of Step 6 removes from V at most 0(n 1 ^' +1 ^) times as many vertices as added 
to /. So, the final set I reported in Step 7 must be large enough so that \I\n 1 ^' +1 ^ = 
fl(n/logn). That is, it must be the case that: 



For / = Jsp _ I, we have: 



/ ( logn _ 1 \ // logn , \\ logn -3 -^ logn- 6 -i _ 6 

(+1 — V 6 2//V6 "•" 2> ~ logn+3 - logn — L logn ' 

So, finally, this implies that: 

|/| = ^n 1 "^) 

= "(io^-2- 6 ) 

= il(n/logn). m 



Appendix B 

An analog of Spencer's result on counting 
extensions 



In this section, we prove an analog of a theorem of Spencer for counting the number of 
images of a rooted graph. 

If (R,H) is a rooted graph (see Definitions 8.2, 8.3, and 8.4), define Im(H, G) to be 
the set of images of H in G and let Num(7?, G) - \lm(H,G)\. Also, for M some model 
(such as G{n,p) or G(n,p, k)) define fi(H,M) to be the expected number of images of H in 
G^M. 

Spencer [35] proves the following result for the random graph model Q(n,p). 

Theorem B.l (Spencer) Let (R,H) be strictly balanced on some constant number of 
vertices and let 6,c > 0. Then, 3K > so that if p is such that n(H,G(n,p)) > Klogn, 
then for G <— Q(n,p): 

Pr[(l-% < Num(H,G) < (1 + %] = 1 - o( n - c ). 

In order to prove that the /-path algorithm of Chapter 7 works as claimed, we need an 
analog of Spencer's result — at least for the case of if a path of some constant length / — 
for the model Q(n,p,k). (As noted in Chapter 7, paths of length / between two roots x and 
y are strictly balanced.) In fact, Spencer's proof goes through in the Q(n,p,k) model with 
only minor modifications. We describe here what those modifications are and how they 
affect Spencer's proof. 

Spencer's result is easiest to prove for the special (but main) case where there exists 
some sufficiently small e so that the expected number /x of images of H in G is at most n e ; 
that is, when Klogn < p < n ( . To simplify our discussion, we will only consider that case 
here. We will also consider only rooted graphs (R,H) that have no automorphisms fixing 
the roots. Spencer counts "extensions" which are essentially all the different maps of H 
into G, whereas we count the images of H; for rooted graphs without such automorphisms, 
these are the same quantity. Note that paths of length / fit into this category. 

For H a path of length / between roots x and y, we would like to prove that the number 
of images of H in G <— Q(n,p,k) given that x and y are chosen the same color, or given 
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that x and y are chosen of different color, are both within (1 + o(l)) of the expectation. In 
order to not prove essentially the same theorem twice — once for each case — let us define 
the notion of a random fc-colorable graph given a particular root coloring. For a root set 
R, there are fcl R l different possible ways to assign k colors to the \R\ vertices. So: 

• Let Gp(n,p,k) be the model Q(n,p,k) given that the subset R of V has the jth of 
&I R I possible colorings. 

B.l Modifying Spencer's result 

Suppose (R,H) is a rooted graph on a constant number of vertices and X is some image 
of H in K n . Let v = nonroots(ff) and e = edges(H). Then Pr[X C G \ G <- G(n,p)] = 
p e . If H has no automorphisms, then p{H,Q{n,p)) = n v p e (l - o(l)). The key fact that 
allows Spencer's argument to go through for Q(n,p,k) is that if H is also £:-colorable, then 
Pr[X C G | G <— G(n,p, k)] = Q(p e ). The reason is that since // has only a constant 
number of vertices, there is a constant probability at least (l/fc^W that in the creation 
of G, the vertices of X are placed into color classes that legally color the graph. So, 
Pr[X CG]> (l/Jb)' v ("V = 0(> e ). 

We now describe how to modify Spencer's proof to prove the following result. 

Theorem B.2 Let (R,H) be strictly balanced on some constant number of vertices with 
no automorphisms fixing the roots, and let 6, c > 0. Then, there exists K, e > so that if 
p, = n(H, G?{n,p, k)) e [K log n, n% then for G «- £f(n,p, k): 

Pr[(l-0)/i < Num(H,G) < (l + 6)fi] = 1 - o( n - c ). 

Proof: For convenience, let M = ^(n,p, k), v - nonroots(H), e = edges(H), and let 
G <— M. Also, let Xi,...,X m be the images of H in K„ and let A, be the event that 
Xi C G. We may assume that H is fc-colorable given that the root set has the jth possible 
assignment of colors, else p, would equal 0. 

From our above observations, for any given image X,, Pr[A",- C G] = Q(p e ) and so 
p = Q(n v p e ). For convenience, let us define p so that Pr[X, C G] = (p) e and thus 
p = (1 - o(l))n^ e . 

Spencer's proof for G(n,p) where p(H,Q(n,p)) 6 [A" log n, n f '] for sufficiently large K' 
and sufficiently small e', proceeds in three stages. First, he proves a theorem stated here as 
Theorem 8.2 of Chapter 8. We have already proven the analog in Theorem 8.3. : Second, 
he proves that for G" <— Q(n,p), with probability 1 — o(n~ c ), the size of every maximal 



Technically, Theorem 8.3 was proven for the semi-random model QsB{n,p,k). However, since in 
G, (n,p, k) there are w.h.p. Q(n) vertices of each color class, the bound holds for this model as well. 
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family T of disjoint X, in G' is within (1 + 6) of p,. Finally he shows that for any fixed 
maximal family T, with probability 1 -o(n~ c ) there are only 0(1) images X,- ^ T in G' that 
intersect some X, € T . Since every image of H must either belong to T or else intersect 
some X, e T (as T is a maximal family of disjoint images), the last 2 parts of Spencer's 
argument imply that Num(#,G') is within (1 + 6) of p, with probability 1 - o{n~ c ). 
Note that for X any subgraph of K„ at all, 

Pr[XCG\G^M] < Pv[X CG'\G' ^G{n,p)). 

The reason is simply that each edge is placed into G <— M with probability at most p (either 
probability p or probability depending on the colors of the endpoints), while in Q(n,p), 
each edge is placed into the graph with probability exactly p. So, if we pick e sufficiently 
small such that p(H,G(n,p)) < n e ' and thus Spencer's argument holds for G" <- G(n,p), 
then the third part of Spencer's argument carries over directly and we need not prove it 
again here. (Recall that p = Q(n v p e ) so for any e' > there exists e > such that 
(M < n =>• (n v p e < n € ).) We focus now on the second part. The analysis here is taken 
directly the proof of Spencer [35]. 

Let us first calculate some basic quantities. First, the number of X t in K n is at most n v 
so we can loosely upper bound the number of families T C Im(i7, K n ) of t pairwise disjoint 
images X { , by (" v ). Also, for any fixed such family T, the probability that T C lm(U,G) 
(that is, that the X,- in T are all in G) is (p e y since the X { € T are all disjoint so the 
corresponding events A { are mutually independent. 

For a given family T of t pairwise disjoint X,-, we now upper bound the probability 
that no image Xj disjoint from all X { G T exists in G; that is, the probability that T is 
a maximal family of disjoint images given that T C Im(#, G). Let X tl ,..., X ir be all the 
images in lm(H,K n ) disjoint from T . We know that: 

r = (n-tv - \R\) W 

since there are (n-tv - \R\) non-root vertices not inside T, and H has no automorphisms. 
By Theorem 8.3, for e sufficiently small, Y,i~j p r[A,- A A,} - o(l) where i ~ j if i ^ j and 
E(X t ) U E(Xj) ^ (f>. So, certainly the summation restricted to just the i,j £ {i x ,. . ., i r } 
equals o(l) as well. Thus, by Janson's inequality, (noting that the "A" term is o(l)) we 
have: 

Pr[f\A h ] = [l + (l)]nPr[i,.J 

= [1 + o(l)](l - p e ) r . (by definition of p) 

Given the above facts, we can upper bound the probability P t that there exists any 
maximal family T of t pairwise disjoint X.'s within G by the quantity: 
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• a t = [l-o(l)]( n ;)(p e )'(l-p e )("-' v -l*l)v. 



Consider now two cases. First, suppose t > n 2e . We may upper bound ("") by ^ < [ ~-\ . 
(Using 3 > 2.718... to avoid confusion with e.) So, since n v p e = 0(n c ) we have: 



r 3n v 



< [if ]Vy <_ [*&.]' = (i)"". ,<»->>. 

Thus, the probability there exists within G a maximal family T of any size t > n 2t is at 
most o(n~ c ). 

The second case is t < n 2t . For e sufficiently small (at most 1/4) we have t < n 1/2 so: 
(n-tv-\R\) v = rf-Qitvn- 1 ) > n v - ©(n^ 1 / 2 ). Thus, 

(l_^)(»-«v-|*|) v < (i_^-)»7(i_^-)e(n->/ a ) 

- (i-p B r v /(i~e(p e n v -^)) 

< (l-p e ) nV /(l-Q(n e n-^ 2 )) 
= (l-p e )" v [l + o(l)]. 

So, we can upper bound the probability P t by: 

p t < [i+o(i)](" v )(p e ) t (i-p e r v 
< [i+o(i)](" t v )(p e ) ( (i-p e r v - ( . 

Thus, P t < (1 + o(l))Pr[y = /] where Y has the binomial distribution B(n v ,p e ). Let //* = 
n v p e . We know for such a distribution, for any {>0we have Pr[\Y - p*\ > |/i*] = o(n~ c ), 
so long as p.* > Klogn for sufficiently large K. Thus, the probability there exists any 
maximal family T of disjoint images X { of size not within 6p of p., and so not within |//* of 
/i*, is at most o(n~ c ). This finishes the second part of Spencer's argument. Since as noted 
above, the third part follows immediately from the result for Q(n,p), we have proved the 
theorem. ■ 
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